
Automation of some macromolecular properties using a machine learning approach
Author(s) -
Merjem Hoxha,
Hiqmet Kamberaj
Publication year - 2021
Publication title -
machine learning: science and technology
Language(s) - English
Resource type - Journals
ISSN - 2632-2153
DOI - 10.1088/2632-2153/abe7b6
Subject(s) - artificial neural network , computer science , bootstrapping (finance) , swarm behaviour , python (programming language) , artificial intelligence , particle swarm optimization , automation , machine learning , algorithm , mathematics , mechanical engineering , engineering , econometrics , operating system
In this study, we employed a newly developed method to predict macromolecular properties using a swarm artificial neural network (ANN) method as a machine learning approach. In this method, the molecular structures are represented by the feature description vectors used as training input data for a neural network. This study aims to develop an efficient approach for training an ANN using either experimental or quantum mechanics data. We aim to introduce an error model controlling the reliability of the prediction confidence interval using a bootstrapping swarm approach. We created different datasets of selected experimental or quantum mechanics results. Using this optimized ANN, we hope to predict properties and their statistical errors for new molecules. There are four datasets used in this study. That includes the dataset of 642 small organic molecules with known experimental hydration free energies, the dataset of 1475 experimental pKa values of ionizable groups in 192 proteins, the dataset of 2693 mutants in 14 proteins with given experimental values of changes in the Gibbs free energy, and a dataset of 7101 quantum mechanics heat of formation calculations. All the data are prepared and optimized using the AMBER force field in the CHARMM macromolecular computer simulation program. The bootstrapping swarm ANN code for performing the optimization and prediction is written in Python computer programming language. The descriptor vectors of the small molecules are based on the Coulomb matrix and sum over bond properties. For the macromolecular systems, they consider the chemical-physical fingerprints of the region in the vicinity of each amino acid.