DEVELOPMENT OF AN IN SILICO PREDICTIVE MODEL FOR ESTIMATING MOLECULAR TOXICITY USING PUBLIC DATA AND ARTIFICIAL INTELLIGENCE
DOI:
https://doi.org/10.51891/rease.v12i1.23728Keywords:
Molecular toxicity. Artificial intelligence. In silico predictive model.Abstract
This article aimed to develop an in silico predictive model, implemented in Python, to estimate the toxicity of small organic molecules using public data and Artificial Intelligence techniques. A dataset of 200 molecules containing up to 10 carbon atoms was constructed from the PubChem repository, prioritizing halogenated and amino compounds structurally related to chloramines and halomethanes. Structural and physicochemical descriptors were extracted, including molar mass, chain type, number of halogens, halogen/carbon ratio, aliphatic and aromatic rings, chiral carbons, and main organic function, in addition to a binary toxicity target variable. Modeling was performed in Google Colab using Random Forest and logistic regression, with class imbalance handled by SMOTENC and performance assessed via holdout (70/30) and stratified cross-validation. Random Forest showed superior overall performance (accuracy 0.9333; balanced accuracy 0.8693; ROC-AUC 0.9673), whereas logistic regression maximized recall (0.9804) and provided greater interpretability, indicating higher risk associated with halogenation and aromaticity and a protective effect of aliphatic rings and a greater number of hydrogens bound to nitrogen. It is concluded that the proposed pipeline is promising for preliminary toxicological screening, although expansion and external validation of the dataset are essential to increase model robustness and generalizability.
Downloads
Downloads
Published
How to Cite
Issue
Section
Categories
License
Atribuição CC BY