Premium
Computer‐aided prediction of toxicity with substructure pattern and random forest
Author(s) -
Cao DongSheng,
Yang YanNing,
Zhao JianChao,
Yan Jun,
Liu Shao,
Hu QianNan,
Xu QingSong,
Liang YiZend
Publication year - 2012
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.1416
Subject(s) - substructure , random forest , computer science , chemical space , data mining , data set , in silico , set (abstract data type) , quantitative structure–activity relationship , chemical toxicity , toxicity , class (philosophy) , artificial intelligence , biological system , machine learning , chemistry , bioinformatics , drug discovery , engineering , biology , gene , biochemistry , structural engineering , organic chemistry , programming language
Toxicity of chemicals induced by different factors is an important consideration, especially during the drug research and development process. Thus, there is urgent need to develop computationally effective models that can predict the toxicity or adverse effects of chemicals for a specific class of chemicals. In this study, random forest (RF) was used to classify five toxicity data sets from Distributed Structure‐Searchable Toxicity database network, using substructure fingerprints calculated directly from simple molecular structure. Three model validation approaches, out‐of‐bag validation incorporated in RF, fivefold cross‐validation, and an independent validation set, were used for assessing the prediction capability of our models. The chemical space analysis of data sets was explored by multidimensional scaling plots, and outlying molecules were also detected by the proximity measure in RF. At the same time, the important substructure fingerprints, recognized by the RF technique, gave some insights into the structure features related to toxicity of chemicals. The results obtained showed that these in silico classification models with substructure patterns and RF are applicable for potential toxicity prediction of chemical compounds. Copyright © 2012 John Wiley & Sons, Ltd.