z-logo
open-access-imgOpen Access
Best of Both Worlds: Combining Pharma Data and State of the Art Modeling Technology To Improve in Silico pKa Prediction
Author(s) -
Robert Fraczkiewicz,
Mario Lobell,
Andreas H. Göller,
Ursula Krenz,
Rolf Schoenneis,
Robert D. Clark,
Alexander Hillisch
Publication year - 2014
Publication title -
journal of chemical information and modeling
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.24
H-Index - 160
eISSN - 1549-960X
pISSN - 1549-9596
DOI - 10.1021/ci500585w
Subject(s) - mean squared error , chemical space , test set , pipeline (software) , computer science , artificial neural network , set (abstract data type) , correlation coefficient , mean absolute error , in silico , data mining , artificial intelligence , machine learning , mathematics , statistics , drug discovery , chemistry , biochemistry , gene , programming language
In a unique collaboration between a software company and a pharmaceutical company, we were able to develop a new in silico pKa prediction tool with outstanding prediction quality. An existing pKa prediction method from Simulations Plus based on artificial neural network ensembles (ANNE), microstates analysis, and literature data was retrained with a large homogeneous data set of drug-like molecules from Bayer. The new model was thus built with curated sets of ∼14,000 literature pKa values (∼11,000 compounds, representing literature chemical space) and ∼19,500 pKa values experimentally determined at Bayer Pharma (∼16,000 compounds, representing industry chemical space). Model validation was performed with several test sets consisting of a total of ∼31,000 new pKa values measured at Bayer. For the largest and most difficult test set with >16,000 pKa values that were not used for training, the original model achieved a mean absolute error (MAE) of 0.72, root-mean-square error (RMSE) of 0.94, and squared correlation coefficient (R(2)) of 0.87. The new model achieves significantly improved prediction statistics, with MAE = 0.50, RMSE = 0.67, and R(2) = 0.93. It is commercially available as part of the Simulations Plus ADMET Predictor release 7.0. Good predictions are only of value when delivered effectively to those who can use them. The new pKa prediction model has been integrated into Pipeline Pilot and the PharmacophorInformatics (PIx) platform used by scientists at Bayer Pharma. Different output formats allow customized application by medicinal chemists, physical chemists, and computational chemists.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom