z-logo
Premium
Visualization and Analysis of Complex Reaction Data: The Case of Tautomeric Equilibria
Author(s) -
Glavatskikh Marta,
Madzhidov Timur,
Baskin Igor I.,
Horvath Dragos,
Nugmanov Ramil,
Gimadiev Timur,
Marcou Gilles,
Varnek Alexandre
Publication year - 2018
Publication title -
molecular informatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.481
H-Index - 68
eISSN - 1868-1751
pISSN - 1868-1743
DOI - 10.1002/minf.201800056
Subject(s) - tautomer , set (abstract data type) , visualization , biological system , chemistry , function (biology) , computer science , data mining , pattern recognition (psychology) , algorithm , artificial intelligence , stereochemistry , biology , programming language , evolutionary biology
Generative Topographic Mapping (GTM) approach was successfully used to visualize, analyze and model the equilibrium constants ( K T ) of tautomeric transformations as a function of both structure and experimental conditions. The modeling set contained 695 entries corresponding to 350 unique transformations of 10 tautomeric types, for which K T values were measured in different solvents and at different temperatures. Two types of GTM‐based classification models were trained: first, a “structural” approach focused on separating tautomeric classes, irrespective of reaction conditions, then a “general” approach accounting for both structure and conditions. In both cases, the cross‐validated Balanced Accuracy was close to 1 and the clusters, assembling equilibria of particular classes, were well separated in 2‐dimentional GTM latent space. Data points corresponding to similar transformations measured under different experimental conditions, are well separated on the maps. Additionally, GTM‐driven regression models were found to have their predictive performance dependent on different scenarios of the selection of local fragment descriptors involving special marked atoms (proton donors or acceptors). The application of local descriptors significantly improves the model performance in 5‐fold cross‐validation: RMSE=0.63 and 0.82 logK T units with and without local descriptors, respectively. This trend was as well observed for SVR calculations, performed for the comparison purposes.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here