Predicting Mutagenicity of Aromatic Amines by Various Machine Learning Approaches
Author(s) -
Max K. Leong,
Sheng-Wen Lin,
Hongbin Chen,
FuYuan Tsai
Publication year - 2010
Publication title -
toxicological sciences
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.352
H-Index - 183
eISSN - 1096-6080
pISSN - 1096-0929
DOI - 10.1093/toxsci/kfq159
Subject(s) - support vector machine , mean squared error , outlier , robustness (evolution) , quantitative structure–activity relationship , machine learning , cross validation , artificial intelligence , training set , set (abstract data type) , molecular descriptor , regression , regression analysis , test set , computer science , mathematics , statistics , chemistry , biochemistry , gene , programming language
Aromatic amines are prevalently used in a wide variety of industries and are ubiquitous in foods and environment. Many of this class of compounds are potentially mutagenic or even carcinogenic, and the assessment and prediction of their mutagenicity are of practical importance because mutagenicity and carcinogenicity are toxicological end points that play major roles in the genesis of cancer and tumor. Quantitative structure-activity relationship of a homogeneous set of mutagenicity data (TA98 + S9), which was comprehensively compiled from literature, was developed by four machine learning methods, namely hierarchical support vector regression (HSVR), support vector machine, radial basis function neural networks, and genetic function algorithm. The predictions by these models are in good agreement with the experimental observations for those molecules in the training set (n = 97, r(2) = 0.78-0.93, q(2) = 0.64-0.93, root mean square error [RMSE] = 0.51-0.90, SD = 0.34-0.56) and the test set (n = 25, r(2) = 0.73-0.85, RMSE = 0.65-0.85, SD = 0.33-0.51). In addition, several validation criteria were adopted to verify those generated models, and a set of outliers was deliberately selected to examine the robustness of these four predictive models (n = 14, r(2) = 0.35-0.84, RMSE = 0.55-1.21, SD = 0.25-0.72). Finally, various cross-comparison schemes, namely forward comparisons, backward comparisons, and most common molecule comparisons, with assorted published predictive models were carried out. Our results indicate that the HSVR model is the most accurate, robust, and consistent and can be employed as a tool for predicting mutagenicity of aromatic amines.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom