Premium
Classification Models for Predicting Cytochrome P450 Enzyme‐Substrate Selectivity
Author(s) -
Zhang Tao,
Dai Hao,
Liu Limin Angela,
Lewis David F. V.,
Wei Dongqing
Publication year - 2012
Publication title -
molecular informatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.481
H-Index - 68
eISSN - 1868-1751
pISSN - 1868-1743
DOI - 10.1002/minf.201100052
Subject(s) - cytochrome p450 , cheminformatics , decision tree , enzyme , artificial intelligence , substrate specificity , isozyme , computer science , substrate (aquarium) , class (philosophy) , computational biology , molecular descriptor , machine learning , chemistry , quantitative structure–activity relationship , biochemistry , biology , computational chemistry , ecology
Cytochrome P450 (CYP) is an important drug‐metabolizing enzyme family. Different CYPs often have different substrate preferences. In addition, one drug molecule may be preferentially metabolized by one or more CYP enzymes. Therefore, the classification and prediction of substrate specificity of CYP enzymes are of importance to the understanding of drug metabolisms and may help guide the development of new drugs. In this study, we used three different machine learning methods to classify CYP substrates for predicting CYP‐substrate specificity based solely on structural and physicochemical properties of the substrates. We first built a simple decision tree model to classify substrates of four CYP enzymes, 1A2, 2C9, 2D6 and 3A4 with more than 78 % classification accuracy. We then built a single‐label eight‐class model and a multilabel five‐class model to classify substrates of eight CYP enzymes and to classify substrates that can be metabolized by more than one CYP enzymes, respectively. Above 90 % and >80 % prediction accuracy was achieved for the single‐label and multilabel models, respectively. The main improvement of our models over existing ones is the automated and unbiased selection of descriptors by genetic algorithms, which makes our methods applicable for larger data sets and increased number of CYP enzymes.