z-logo
Premium
Block‐wise Exploration of Molecular Descriptors with Multi‐block Orthogonal Component Analysis (MOCA)
Author(s) -
Schmidt Sebastian,
Schindler Michael,
Eriksson Lennart
Publication year - 2022
Publication title -
molecular informatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.481
H-Index - 68
eISSN - 1868-1751
pISSN - 1868-1743
DOI - 10.1002/minf.202100165
Subject(s) - block (permutation group theory) , principal component analysis , redundancy (engineering) , computer science , a priori and a posteriori , metric (unit) , molecular descriptor , component (thermodynamics) , cheminformatics , data mining , feature selection , set (abstract data type) , pattern recognition (psychology) , artificial intelligence , quantitative structure–activity relationship , mathematics , machine learning , chemistry , engineering , computational chemistry , philosophy , operations management , physics , geometry , epistemology , thermodynamics , operating system , programming language
Data tables for machine learning and structure‐activity relationship modelling (QSAR) are often naturally organized in blocks of data, where multiple molecular representations or sets of descriptors form the blocks. Multi‐block Orthogonal Component Analysis (MOCA), a new analytical tool, can be used to explore such data structures in a single model, identifying principal components that are unique to a single block or joint over multiple blocks. We applied MOCA to two sets of 550 and 300 molecules and up to 9213 molecular descriptors organized in 11 blocks. The MOCA models reveal relationships between the blocks and overarching trends across the whole dataset. Based on the MOCA joint components, we propose a quantitative metric for the redundancy of blocks, useful for a priori block‐wise feature selection or evaluation of new molecular representations. The second data set includes 7 ecotoxicological study endpoints for crop protection chemicals, for which we (re‐)discovered some general trends and linked them to molecular properties. Using a single MOCA model we estimated the predictive potential of each block and the model‐ability of the target block.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here