z-logo
Premium
Predicting the similarity search performance of fingerprints and their combination with molecular property descriptors using probabilistic and information theoretic modeling
Author(s) -
Vogt Martin,
Nisius Britta,
Bajorath Jürgen
Publication year - 2009
Publication title -
statistical analysis and data mining: the asa data science journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.381
H-Index - 33
eISSN - 1932-1872
pISSN - 1932-1864
DOI - 10.1002/sam.10035
Subject(s) - fingerprint (computing) , computer science , similarity (geometry) , nearest neighbor search , data mining , precision and recall , property (philosophy) , class (philosophy) , probabilistic logic , artificial intelligence , molecular descriptor , bayesian probability , machine learning , pattern recognition (psychology) , quantitative structure–activity relationship , philosophy , epistemology , image (mathematics)
Similarity searching is currently one of the most widely applied approaches to computationally screen large databases for novel active compounds, and molecular fingerprints are among the most popular search tools. Fingerprint searching has recently also been applied in chemical biology to identify compounds that are selective for a target within a group of related ones. In general, fingerprints are bit string representations of molecular structure and properties but their design, size, and complexity often vary substantially. Like essentially all similarity search tools, fingerprints display a strong compound class dependence in their ability to identify active molecules and distinguish them from other database compounds. In practical applications, this limitation makes it very difficult to select or prioritize fingerprints that are most suitable for a given search problem. We have previously (i) devised a Bayesian‐scoring scheme to combine fingerprints and molecular property descriptors for similarity searching and (ii) developed an information‐theoretic approach to predict active compound recall rates for fingerprint searching. Herein, we combine these methods and present an approach for the prediction of compound recall in search calculations using Bayesian screening with molecular property descriptors, fingerprints and their combination. For practical similarity search applications, this analysis is highly relevant because it makes it possible to identify search methods that are most likely to be successful for a given compound activity class and screening database. Copyright © 2009 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 2: 123‐134, 2009

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here