DEVELOPMENT OF AN IN SILICO PREDICTIVE MODEL FOR ESTIMATING MOLECULAR TOXICITY USING PUBLIC DATA AND ARTIFICIAL INTELLIGENCE

Authors

DOI:

https://doi.org/10.51891/rease.v12i1.23728

Keywords:

Molecular toxicity. Artificial intelligence. In silico predictive model.

Abstract

This article aimed to develop an in silico predictive model, implemented in Python, to estimate the toxicity of small organic molecules using public data and Artificial Intelligence techniques. A dataset of 200 molecules containing up to 10 carbon atoms was constructed from the PubChem repository, prioritizing halogenated and amino compounds structurally related to chloramines and halomethanes. Structural and physicochemical descriptors were extracted, including molar mass, chain type, number of halogens, halogen/carbon ratio, aliphatic and aromatic rings, chiral carbons, and main organic function, in addition to a binary toxicity target variable. Modeling was performed in Google Colab using Random Forest and logistic regression, with class imbalance handled by SMOTENC and performance assessed via holdout (70/30) and stratified cross-validation. Random Forest showed superior overall performance (accuracy 0.9333; balanced accuracy 0.8693; ROC-AUC 0.9673), whereas logistic regression maximized recall (0.9804) and provided greater interpretability, indicating higher risk associated with halogenation and aromaticity and a protective effect of aliphatic rings and a greater number of hydrogens bound to nitrogen. It is concluded that the proposed pipeline is promising for preliminary toxicological screening, although expansion and external validation of the dataset are essential to increase model robustness and generalizability.

Downloads

Download data is not yet available.

Author Biographies

Casimiro Waete Agostinho, Centro Universitário Única

Discente do curso de Ciência de Dados e Inteligência Artificial do Centro Universitário Única.

Grazielly Honorio Rodrigues de Freitas, Centro Universitário Única

Discente do curso de Química do Centro Universitário Única.

John Henrique Soares Costa, Centro Universitário Única

Discente do curso de Farmácia do Centro Universitário Única.

Maria Eduarda de Melo Pretes, Centro Universitário Única

Discente do curso de Farmácia do Centro Universitário Única.

William Argolo Saliba, Centro Universitário Única

Docente do Centro Universitário Única - Prof. Orientador. Centro Universitário Única – UNIÚNICA.

 

Published

2026-01-20

How to Cite

Agostinho, C. W., Freitas, G. H. R. de, Costa, J. H. S., Pretes, M. E. de M., & Saliba, W. A. (2026). DEVELOPMENT OF AN IN SILICO PREDICTIVE MODEL FOR ESTIMATING MOLECULAR TOXICITY USING PUBLIC DATA AND ARTIFICIAL INTELLIGENCE. Revista Ibero-Americana De Humanidades, Ciências E Educação, 12(1), 1–15. https://doi.org/10.51891/rease.v12i1.23728