Premium
QUANTIS: Data quality assessment tool by clustering analysis
Author(s) -
Symoens Steffen H.,
Aravindakshan Syam Ukkandath,
Vermeire Florence H.,
De Ras Kevin,
Djokic Marko R.,
Marin Guy B.,
Reyniers MarieFrançoise,
Van Geem Kevin M.
Publication year - 2019
Publication title -
international journal of chemical kinetics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.341
H-Index - 68
eISSN - 1097-4601
pISSN - 0538-8066
DOI - 10.1002/kin.21316
Subject(s) - computer science , data mining , python (programming language) , experimental data , identifier , cluster analysis , set (abstract data type) , data quality , algorithm , machine learning , programming language , mathematics , statistics , metric (unit) , operations management , economics
Automatically generated kinetic networks are ideally validated against a large set of accurate, reproducible, and easy‐to‐model experimental data. However, although this might seem simple, it proves to be quite challenging. QUANTIS, a publicly available Python package, is specifically developed to evaluate both the precision and accuracy of experimental data and to ensure a uniform, quick processing, and storage strategy that enables automated comparison of developed kinetic models. The precision is investigated with two clustering techniques, PCA and t‐SNE, whereas the accuracy is probed with checks for the conservation laws. First, the developed tool processes, evaluates, and stores experimental yield data automatically. All data belonging to a given experiment, both unprocessed and processed, are stored in the form of an HDF5 container. The demonstration of QUANTIS on three different pyrolysis cases showed that it can help in identifying and overcoming instabilities in experimental datasets, reduce mass and molar balance closure discrepancies, and, by evaluating the visualized correlation matrices, increase understanding in the underlying reaction pathways. Inclusion of all experimental data in the HDF5 file makes it possible to automate simulating the experiment with CHEMKIN. Because of the employed InChI string identifiers for molecules, it is possible to automate the comparison experiment/simulation. QUANTIS and the concepts demonstrated therein is a potentially useful tool for data quality assessment, kinetic model validation, and refinement.