z-logo
Premium
Machine learning‐based prediction of enzyme substrate scope: Application to bacterial nitrilases
Author(s) -
Mou Zhongyu,
Eakes Jason,
Cooper Connor J.,
Foster Carmen M.,
Standaert Robert F.,
Podar Mircea,
Doktycz Mitchel J.,
Parks Jerry M.
Publication year - 2021
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/prot.26019
Subject(s) - support vector machine , random forest , machine learning , artificial intelligence , docking (animal) , computer science , scope (computer science) , substrate (aquarium) , homology modeling , substrate specificity , active site , chemistry , data mining , computational biology , enzyme , biochemistry , biology , programming language , medicine , nursing , ecology
Abstract Predicting the range of substrates accepted by an enzyme from its amino acid sequence is challenging. Although sequence‐ and structure‐based annotation approaches are often accurate for predicting broad categories of substrate specificity, they generally cannot predict which specific molecules will be accepted as substrates for a given enzyme, particularly within a class of closely related molecules. Combining targeted experimental activity data with structural modeling, ligand docking, and physicochemical properties of proteins and ligands with various machine learning models provides complementary information that can lead to accurate predictions of substrate scope for related enzymes. Here we describe such an approach that can predict the substrate scope of bacterial nitrilases, which catalyze the hydrolysis of nitrile compounds to the corresponding carboxylic acids and ammonia. Each of the four machine learning models (logistic regression, random forest, gradient‐boosted decision trees, and support vector machines) performed similarly (average ROC = 0.9, average accuracy = ~82%) for predicting substrate scope for this dataset, although random forest offers some advantages. This approach is intended to be highly modular with respect to physicochemical property calculations and software used for structural modeling and docking.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here