z-logo
Premium
Tree and Hashing Data Structures to Speed up Chemical Searches: Analysis and Experiments
Author(s) -
Nasr Ramzi,
Kristensen Thomas,
Baldi Pierre
Publication year - 2011
Publication title -
molecular informatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.481
H-Index - 68
eISSN - 1868-1751
pISSN - 1868-1743
DOI - 10.1002/minf.201100089
Subject(s) - cheminformatics , computer science , locality sensitive hashing , pruning , chemical space , data mining , chemical database , hash function , tree (set theory) , nearest neighbor search , skyline , hash table , theoretical computer science , mathematics , chemistry , computational chemistry , mathematical analysis , biochemistry , computer security , organic chemistry , agronomy , biology , drug discovery
In many large chemoinformatics database systems, molecules are represented by long binary fingerprint vectors whose components record the presence or absence of particular functional groups or combinatorial features. For a given query molecule, one is interested in retrieving all the molecules in the database with a similarity to the query above a certain threshold. Here we describe a method for speeding up chemical searches in these large databases of small molecules by combining previously developed tree and hashing data structures to prune the search space without any false negatives. More importantly, we provide a mathematical analysis that allows one to predict the level of pruning, and validate the quality of the predictions of the method through simulation experiments.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here