z-logo
open-access-imgOpen Access
Thermochemical Data Fusion Using Graph Representation Learning
Author(s) -
Himaghna Bhattacharjee,
Dionisios G. Vlachos
Publication year - 2020
Publication title -
journal of chemical information and modeling
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.24
H-Index - 160
eISSN - 1549-960X
pISSN - 1549-9596
DOI - 10.1021/acs.jcim.0c00699
Subject(s) - computer science , outlier , representation (politics) , sensor fusion , computation , graph , fusion , data set , data mining , artificial intelligence , machine learning , theoretical computer science , algorithm , linguistics , philosophy , politics , political science , law
Large databases are required for "Big Data" applications in catalysis and materials science. Thermochemical databases can be created by combining data from various sources and by correcting low-fidelity data sets to higher accuracy with minimal computation. To achieve this "data fusion", thermochemical quantities of interest, calculated at various levels of density functional theory (DFT), need to be mapped to the same, high levels of theory. In this work, a graph theoretical, statistical framework is proposed for such tasks. Subgraph frequencies are shown to provide a natural representation for learning these fusion maps. The maps are linear and are learnt with automated descriptor selection. Using a data set of as few as ∼1% from the QM9 database of 133 885 molecules, these models can predict multiple thermochemical quantities at a higher level of theory with an accuracy of 1 kcal/mol. The method is explainable, generalizable, and provides a diagnostic tool for outlier identification.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom