Premium
QSAR Prediction of Passive Permeability in the LLC‐PK1 Cell Line: Trends in Molecular Properties and Cross‐Prediction of Caco‐2 Permeabilities
Author(s) -
Sherer Edward C.,
Verras Andreas,
Madeira Maria,
Hagmann William K.,
Sheridan Robert P.,
Roberts Drew,
Bleasby Kelly,
Cornell Wendy D.
Publication year - 2012
Publication title -
molecular informatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.481
H-Index - 68
eISSN - 1868-1751
pISSN - 1868-1743
DOI - 10.1002/minf.201100157
Subject(s) - quantitative structure–activity relationship , training set , molecular descriptor , permeability (electromagnetism) , cell permeability , random forest , biological system , similarity (geometry) , chemistry , cross validation , test set , artificial intelligence , computer science , machine learning , biochemistry , membrane , image (mathematics) , biology
Abstract A QSAR model for predicting passive permeability ( P app ) was derived from P app values measured in the LLC‐PK1 cell line. The QSAR method and descriptor set that performed best in terms of cross‐validation was random forest with a combination of AP, DP, and MOE_2D descriptors. The QSAR model was used to predict the Caco‐2 cell permeability for 313 compounds described in the literature with good success. We find that passive permeability for different cell lines can be predicted with similar molecular properties and descriptors. It is shown that the variation in experimental measurements of P app is smaller than the error in QSAR predictions indicating that predictions are not quantitatively perfect, although qualitatively useful. We get better predictions if the training set is large and diverse, rather than smaller and more internally consistent. This is because prediction accuracy falls off quickly with decreasing similarity to the training set and it is therefore better to have as large a training set as possible. While single physical parameters are not as good as a full QSAR model in predicting P app , log D seems the most important parameter. Intermediate values of log D are associated with higher P app .