A machine learning workflow for molecular analysis: application to melting points
Author(s) -
Ganesh Sivaraman,
Nicholas E. Jackson,
Benjamín Sánchez-Lengeling,
Álvaro VázquezMayagoitia,
Alán AspuruGuzik,
Venkatram Vishwanath,
Juan Pablo
Publication year - 2020
Publication title -
machine learning science and technology
Language(s) - English
Resource type - Journals
ISSN - 2632-2153
DOI - 10.1088/2632-2153/ab8aa3
Subject(s) - workflow , computer science , autoencoder , molecular graph , context (archaeology) , artificial intelligence , cluster analysis , machine learning , representation (politics) , data mining , graph , theoretical computer science , deep learning , database , paleontology , politics , political science , law , biology
Computational tools encompassing integrated molecular prediction, analysis, and generation are key for molecular design in a variety of critical applications. In this work, we develop a workflow for molecular analysis (MOLAN) that integrates an ensemble of supervised and unsupervised machine learning techniques to analyze molecular data sets. The MOLAN workflow combines molecular featurization, clustering algorithms, uncertainty analysis, low-bias dataset construction, high-performance regression models, graph-based molecular embeddings and attribution, and a semi-supervised variational autoencoder based on the novel SELFIES representation to enable molecular design. We demonstrate the utility of the MOLAN workflow in the context of a challenging multi-molecule property prediction problem: the determination of melting points solely from single molecule structure. This application serves as a case study for how to employ the MOLAN workflow in the context of molecular property prediction.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom