z-logo
Premium
Distance phenomena in high‐dimensional chemical descriptor spaces: Consequences for similarity‐based approaches
Author(s) -
Rupp Matthias,
Schneider Petra,
Schneider Gisbert
Publication year - 2009
Publication title -
journal of computational chemistry
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.907
H-Index - 188
eISSN - 1096-987X
pISSN - 0192-8651
DOI - 10.1002/jcc.21218
Subject(s) - cheminformatics , similarity (geometry) , ranking (information retrieval) , similarity measure , cluster analysis , measure (data warehouse) , space (punctuation) , computer science , mathematics , vector space , data mining , artificial intelligence , pattern recognition (psychology) , pure mathematics , chemistry , computational chemistry , image (mathematics) , operating system
Measuring the (dis)similarity of molecules is important for many cheminformatics applications like compound ranking, clustering, and property prediction. In this work, we focus on real‐valued vector representations of molecules (as opposed to the binary spaces of fingerprints). We demonstrate the influence which the choice of (dis)similarity measure can have on results, and provide recommendations for such choices. We review the mathematical concepts used to measure (dis)similarity in vector spaces, namely norms, metrics, inner products, and, similarity coefficients, as well as the relationships between them, employing (dis)similarity measures commonly used in cheminformatics as examples. We present several phenomena (empty space phenomenon, sphere volume related phenomena, distance concentration) in high‐dimensional descriptor spaces which are not encountered in two and three dimensions. These phenomena are theoretically characterized and illustrated on both artificial and real (bioactivity) data. © 2009 Wiley Periodicals, Inc. J Comput Chem, 2009

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here