Rapid similarity searches of nucleic acid and protein data banks.
Author(s) -
W. John Wilbur,
David J. Lipman
Publication year - 1983
Publication title -
proceedings of the national academy of sciences
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 5.011
H-Index - 771
eISSN - 1091-6490
pISSN - 0027-8424
DOI - 10.1073/pnas.80.3.726
Subject(s) - data bank , nucleic acid , tuple , sequence database , sequence (biology) , protein data bank , computer science , similarity (geometry) , matching (statistics) , sequence alignment , sequence analysis , protein sequencing , computational biology , data mining , peptide sequence , biology , dna , genetics , protein structure , mathematics , biochemistry , artificial intelligence , statistics , gene , discrete mathematics , telecommunications , image (mathematics)
With the development of large data banks of protein and nucleic acid sequences, the need for efficient methods of searching such banks for sequences similar to a given sequence has become evident. We present an algorithm for the global comparison of sequences based on matching k-tuples of sequence elements for a fixed k. The method results in substantial reduction in the time required to search a data bank when compared with prior techniques of similarity analysis, with minimal loss in sensitivity. The algorithm has also been adapted, in a separate implementation, to produce rigorous sequence alignments. Currently, using the DEC KL-10 system, we can compare all sequences in the entire Protein Data Bank of the National Biomedical Research Foundation with a 350-residue query sequence in less than 3 min and carry out a similar analysis with a 500-base query sequence against all eukaryotic sequences in the Los Alamos Nucleic Acid Data Base in less than 2 min.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom