z-logo
Premium
SketchSort: Fast All Pairs Similarity Search for Large Databases of Molecular Fingerprints
Author(s) -
Tabei Yasuo,
Tsuda Koji
Publication year - 2011
Publication title -
molecular informatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.481
H-Index - 68
eISSN - 1868-1751
pISSN - 1868-1743
DOI - 10.1002/minf.201100050
Subject(s) - similarity (geometry) , nearest neighbor search , computer science , cheminformatics , database , information retrieval , fingerprint (computing) , data mining , computational biology , chemistry , artificial intelligence , biology , computational chemistry , image (mathematics)
Similarity networks of ligands are often reported useful in predicting chemical activities and target proteins. However, the naive method of computing all pairwise similarities of chemical fingerprints takes quadratic time, which is prohibitive for large scale databases with millions of ligands. We propose a fast all pairs similarity search method, called SketchSort, that maps chemical fingerprints to symbol strings with random projections, and finds similar strings by multiple masked sorting. Due to random projection, SketchSort misses a certain fraction of neighbors (i.e., false negatives). Nevertheless, the expected fraction of false negatives is theoretically derived and can be kept under a very small value. Experiments show that SketchSort is much faster than other similarity search methods and enables us to obtain a PubChem‐scale similarity network quickly.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here