Premium
The Development of Novel Chemical Fragment‐Based Descriptors Using Frequent Common Subgraph Mining Approach and Their Application in QSAR Modeling
Author(s) -
Khashan Raed,
Zheng Weifan,
Tropsha Alexander
Publication year - 2014
Publication title -
molecular informatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.481
H-Index - 68
eISSN - 1868-1751
pISSN - 1868-1743
DOI - 10.1002/minf.201300165
Subject(s) - quantitative structure–activity relationship , molecular descriptor , fragment (logic) , molecular graph , artificial intelligence , computer science , data mining , graph , training set , pattern recognition (psychology) , machine learning , algorithm , theoretical computer science
We present a novel approach to generating fragment‐based molecular descriptors. The molecules are represented by labeled undirected chemical graph. Fast Frequent Subgraph Mining (FFSM) is used to find chemical‐fragments (subgraphs) that occur in at least a subset of all molecules in a dataset. The collection of frequent subgraphs (FSG) forms a dataset‐specific descriptors whose values for each molecule are defined by the number of times each frequent fragment occurs in this molecule. We have employed the FSG descriptors to develop variable selection k Nearest Neighbor ( k NN) QSAR models of several datasets with binary target property including Maximum Recommended Therapeutic Dose (MRTD), Salmonella Mutagenicity (Ames Genotoxicity), and P‐Glycoprotein (PGP) data. Each dataset was divided into training, test, and validation sets to establish the statistical figures of merit reflecting the model validated predictive power. The classification accuracies of models for both training and test sets for all datasets exceeded 75 %, and the accuracy for the external validation sets exceeded 72 %. The model accuracies were comparable or better than those reported earlier in the literature for the same datasets. Furthermore, the use of fragment‐based descriptors affords mechanistic interpretation of validated QSAR models in terms of essential chemical fragments responsible for the compounds’ target property.