Premium
In Silico Estimation of Chemical Carcinogenicity with Binary and Ternary Classification Methods
Author(s) -
Li Xiao,
Du Zheng,
Wang Jie,
Wu Zengrui,
Li Weihua,
Liu Guixia,
Shen Xu,
Tang Yun
Publication year - 2015
Publication title -
molecular informatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.481
H-Index - 68
eISSN - 1868-1751
pISSN - 1868-1743
DOI - 10.1002/minf.201400127
Subject(s) - carcinogen , ternary operation , binary number , in silico , chemistry , computer science , data mining , mathematics , organic chemistry , biochemistry , arithmetic , gene , programming language
Carcinogenicity is one of the most concerned properties of chemicals to human health, thus it is important to identify chemical carcinogenicity as early as possible. In this study, 829 diverse compounds with rat carcinogenicity were collected from Carcinogenic Potency Database (CPDB). Using six types of fingerprints to represent the molecules, 30 binary and ternary classification models were generated to predict chemical carcinogenicity by five machine learning methods. The models were evaluated by an external validation set containing 87 chemicals from ISSCAN database. The best binary model was developed by MACCS keys and kNN algorithm with predictive accuracy at 83.91 %, while the best ternary model was also generated by MACCS keys and kNN algorithm with overall accuracy at 80.46 %. Furthermore, the best binary and ternary classification models were used to estimate carcinogenicity of tobacco smoke components containing 2251 compounds. 981 ones were predicted as carcinogens by binary classification model, while 110 compounds were predicted as strong carcinogens and 807 ones as weak carcinogens by ternary classification model. The results indicated that our models would be helpful for prediction of chemical carcinogenicity.