
Comparing Machine Learning Models for Aromatase (P450 19A1)
Author(s) -
Kimberley M. Zorn,
Daniel H. Foil,
Thomas R. Lane,
Wendy Hillwalker,
David J Feifarek,
Frank E. Jones,
William D Klaren,
Ashley M Brinkman,
Sean Ekins
Publication year - 2020
Publication title -
environmental science and technology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.851
H-Index - 397
eISSN - 1520-5851
pISSN - 0013-936X
DOI - 10.1021/acs.est.0c05771
Subject(s) - aromatase , machine learning , artificial intelligence , aromatase inhibitor , estrogen , computer science , androgen , computational biology , chemistry , biology , hormone , medicine , biochemistry , endocrinology , cancer , breast cancer
Aromatase, or cytochrome P450 19A1, catalyzes the aromatization of androgens to estrogens within the body. Changes in the activity of this enzyme can produce hormonal imbalances that can be detrimental to sexual and skeletal development. Inhibition of this enzyme can occur with drugs and natural products as well as environmental chemicals. Therefore, predicting potential endocrine disruption via exogenous chemicals requires that aromatase inhibition be considered in addition to androgen and estrogen pathway interference. Bayesian machine learning methods can be used for prospective prediction from the molecular structure without the need for experimental data. Herein, the generation and evaluation of multiple machine learning models utilizing different sources of aromatase inhibition data are described. These models are applied to two test sets for external validation with molecules relevant to drug discovery from the public domain. In addition, the performance of multiple machine learning algorithms was evaluated by comparing internal five-fold cross-validation statistics of the training data. These methods to predict aromatase inhibition from molecular structure, when used in concert with estrogen and androgen machine learning models, allow for a more holistic assessment of endocrine-disrupting potential of chemicals with limited empirical data and enable the reduction of the use of hazardous substances.