Premium
Special issue on machine learning and quantum mechanics
Author(s) -
Rupp Matthias
Publication year - 2015
Publication title -
international journal of quantum chemistry
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.484
H-Index - 105
eISSN - 1097-461X
pISSN - 0020-7608
DOI - 10.1002/qua.24955
Subject(s) - ansatz , computer science , statistical physics , quantum , relaxation (psychology) , electronic structure , theoretical computer science , mathematics , quantum mechanics , physics , psychology , social psychology
Models that combine quantum mechanics (QM) with machine learning (ML) have seen strong renewed interest over the last years. This is reflected in dedicated research programs, workshops, and publications such as this special issue of International Journal of Quantum Chemistry on ML and QM. In the following, I briefly outline idea and history of these models, focusing on contributions in this issue. Systematic computational design and study of molecules and materials requires rigorous, unbiased, and accurate treatment on the atomic scale. While numerical approximations to the many-electron problem have become available, their prohibitive computational cost severely limits their applicability. Based on the reasoning that electronic structure calculations of similar systems contain redundant information, ML models have been developed that interpolate between a computationally feasible number of QM reference calculations to predict properties of new similar systems. Essentially, the problem of solving the electronic Schr€ odinger equation is mapped onto a nonlinear statistical regression problem. Example applications include structural relaxation, molecular dynamics, and highthroughput calculation of quantum chemical properties. This Ansatz has been demonstrated to enable computational savings of up to several orders of magnitude, with accuracy on par with the reference method, in applications involving large systems, long time scales, or large numbers of systems. Interpolating QM results poses challenges that are distinctly different from those in cheminformatics (e.g., quantitative structure-property/activity relationships), where experimental results are interpolated; in particular, there is no noise as modeled properties are outcomes of deterministic procedures, and representations have to respect small changes in geometry to enable modeling of properties with high accuracy. Although interpolation techniques have been used in early QM calculations, the systematic application of methods belonging to what is today called ML started perhaps around the 1990s, an example being the fitting of eigenenergies of harmonic oscillators by artificial neural networks (ANN). In the following decade, ANNs were used for interpolation of potential energy surfaces of single systems and have since developed into powerful tools for large-scale molecular dynamics simulations. A variety of other approaches, including Shepard interpolation, cubic splines, moving least-squares, and symbolic regression, were used as well. In this issue, a tutorial review of ANN potentials is given by J€ org Behler, one of their major proponents; Sergei Manzhos, Richard Dawes and Tucker Carrington Jr. discuss sumof-product ANNs related to many-bodyexpansions. Interpolation between QM results for different systems, for example, molecular property estimates, started roughly a decade later, first with ANNs, joined later by kernel-based ML methods such as support vector machines and Gaussian process regression (GPR). The reader can find a brief general introduction to kernel-based ML for QM data in my tutorial. GPR, sometimes known as Kriging, yields the same predictions as kernel ridge regression (KRR), although other bells and whistles like predictive variance are different. Both GPR and KRR have become popular for predictions across chemical compound space and interpolation of potential energy surfaces. In their contribution, Albert P. Bart ok and G abor Cs anyi take the reader on a tour through their Gaussian approximation potentials approach for potential energy surface interpolation, and Paul L.A. Popelier summarizes progress on the development of a GPR potential for peptides and proteins based on Quantum Chemical Topology. An example of thermochemical property predictions across different molecules can be found in the article by Jianming Wu, Yuwei Zhou, and Xin Xu, who use ANNs to statistically correct DFT/B3LYP predictions with respect to experimental values. One of the most important aspects of a QM/ML model is how a system, be it molecular or periodic, is numerically represented for interpolation. A wide variety of representations has been proposed, including symmetry functions, ad hoc descriptors, smooth overlap of atomic positions, and the Coulomb matrix. O. Anatole von Lilienfeld et al. discuss requirements on molecular representations at the example of a descriptor based on Fourier expansions of atomic radial basis functions. Developing representations that allow generalization across different materials has been challenging so far; Felix Faber et al. present new results on generalizations of the Coulomb matrix for periodic systems. On the ML side, Kevin Vu et al. present a detailed analysis of the workhorse model of many studies, KRR with Gaussian kernel. John Snyder et al., in a continuation of previous work on learning the kinetic energy as a functional of the electron density for an orbital-free density functional theory, describe an improved algorithm to constrain density optimization to the training data manifold. Scaling up ML-based large-scale molecular dynamics simulations requires dedicated methodological effort. Venkatesh Botu und Rampi Ramprasad describe dynamic model retraining (“learning on the fly”) for their models using fingerprint representations of materials. Marco Caccin et al. report on a