Premium
Prediction of Human Clearance Based on Animal Data and Molecular Properties
Author(s) -
Huang Wenkang,
Geng Lv,
Deng Rong,
Lu Shaoyong,
Ma Guangli,
Yu Jianxiu,
Zhang Jian,
Liu Wei,
Hou Tingjun,
Lu Xuefeng
Publication year - 2015
Publication title -
chemical biology and drug design
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.59
H-Index - 77
eISSN - 1747-0285
pISSN - 1747-0277
DOI - 10.1111/cbdd.12567
Subject(s) - support vector machine , outlier , molecular descriptor , cross validation , data set , mean squared error , extrapolation , test set , correlation coefficient , consistency (knowledge bases) , computer science , similarity (geometry) , biological system , pattern recognition (psychology) , artificial intelligence , mathematics , data mining , quantitative structure–activity relationship , statistics , machine learning , biology , image (mathematics)
Human clearance is often predicted prior to clinical study from in vivo preclinical data by virtue of interspecies allometric scaling methods. The aims of this study were to determine the important molecular descriptors for the extrapolation of animal data to human clearance and further to build a model to predict human clearance by combination of animal data and the selected molecular descriptors. These important molecular descriptors selected by genetic algorithm ( GA ) were from five classes: quantum mechanical, shadow indices, E‐state keys, molecular properties, and molecular property counts. Although the data set contained many outliers determined by the conventional Mahmood method, the variation of most outliers was reduced significantly by our final support vector machine ( SVM ) model. The values of cross‐validated correlation coefficient and root‐mean‐squared error ( RMSE ) for leave‐one‐out cross‐validation ( LOOCV ) of the final SVM model were 0.783 and 0.305, respectively. Meanwhile, the reliability and consistency of the final model were also validated by an external test set. In conclusion, the SVM model based on the molecular descriptors selected by GA and animal data achieved better prediction performance than the Mahmood method. This approach can be applied as an improved interspecies allometric scaling method in drug research and development.