Premium
Improving the Search Performance of Extended Connectivity Fingerprints through Activity‐Oriented Feature Filtering and Application of a Bit‐Density‐Dependent Similarity Function
Author(s) -
Hu Ye,
Lounkine Eugen,
Bajorath Jürgen
Publication year - 2009
Publication title -
chemmedchem
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.817
H-Index - 100
eISSN - 1860-7187
pISSN - 1860-7179
DOI - 10.1002/cmdc.200800408
Subject(s) - similarity (geometry) , nearest neighbor search , feature (linguistics) , pipeline (software) , computer science , bit array , string (physics) , pattern recognition (psychology) , filter (signal processing) , fingerprint (computing) , k nearest neighbors algorithm , artificial intelligence , function (biology) , metric (unit) , data mining , mathematics , image (mathematics) , engineering , computer vision , mechanical engineering , drilling , philosophy , linguistics , operations management , evolutionary biology , biology , mathematical physics , programming language
Improving fingerprint search performance : An activity‐oriented feature‐ filtering procedure and a corresponding similarity function were developed for molecule‐specific fingerprints, recording ensembles of structural patterns such as the popular extended connectivity fingerprints. Shown are comparisons of search calculations for cyclooxygenase inhibitors based on k nearest neighbor (1NN, 10NN) and Tanimoto coefficient (Tc) calculations, and the ACF BDM approach introduced herein.The Pipeline Pilot extended connectivity fingerprints (ECFPs) are currently among the most popular similarity search tools in drug discovery settings. ECFPs do not have a fixed bit string format but generate variable numbers of structural features for individual test molecules. This variable string design makes ECFP representations amenable to compound‐class‐directed modification. We have devised an intuitive feature‐filtering technique that focuses ECFP search calculations on feature string ensembles of given compound activity classes. In combination with a simple bit‐density‐dependent similarity function, feature filtering consistently improved the search performance of ECFP calculations based on Tanimoto similarity and state‐of‐the‐art data fusion techniques on a diverse array of activity classes. Feature filtering and the bit density similarity metric are easily implemented in the Pipeline Pilot environment. The approach provides a viable alternative to conventional similarity searching and should be of general interest to further improve the success rate of practical ECFP applications.