Thermochemical Data Fusion Using Graph Representation Learning | Zendy

Himaghna Bhattacharjee | Zendy; Dionisios G. Vlachos | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Thermochemical Data Fusion Using Graph Representation Learning

Author(s) -

Himaghna Bhattacharjee,

Dionisios G. Vlachos

Publication year - 2020

Publication title -

journal of chemical information and modeling

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.24

H-Index - 160

eISSN - 1549-960X

pISSN - 1549-9596

DOI - 10.1021/acs.jcim.0c00699

Subject(s) - computer science , outlier , representation (politics) , sensor fusion , computation , graph , fusion , data set , data mining , artificial intelligence , machine learning , theoretical computer science , algorithm , linguistics , philosophy , politics , political science , law

Large databases are required for "Big Data" applications in catalysis and materials science. Thermochemical databases can be created by combining data from various sources and by correcting low-fidelity data sets to higher accuracy with minimal computation. To achieve this "data fusion", thermochemical quantities of interest, calculated at various levels of density functional theory (DFT), need to be mapped to the same, high levels of theory. In this work, a graph theoretical, statistical framework is proposed for such tasks. Subgraph frequencies are shown to provide a natural representation for learning these fusion maps. The maps are linear and are learnt with automated descriptor selection. Using a data set of as few as ∼1% from the QM9 database of 133 885 molecules, these models can predict multiple thermochemical quantities at a higher level of theory with an accuracy of 1 kcal/mol. The method is explainable, generalizable, and provides a diagnostic tool for outlier identification.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research