z-logo
open-access-imgOpen Access
Improved BLAST searches using longer words for protein seeding
Author(s) -
Sergey Shiryev,
Jason S. Papadopoulos,
Alejandro A. Schäffer,
Richa Agarwala
Publication year - 2007
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btm479
Subject(s) - executable , computer science , file transfer protocol , heuristic , word (group theory) , code (set theory) , operator (biology) , data mining , artificial intelligence , information retrieval , programming language , operating system , mathematics , biology , set (abstract data type) , geometry , the internet , biochemistry , repressor , transcription factor , gene
The blastp and tblastn modules of BLAST are widely used methods for searching protein queries against protein and nucleotide databases, respectively. One heuristic used in BLAST is to consider only database sequences that contain a high-scoring match of length at most 5 to the query. We implemented the capability to use words of length 6 or 7. We demonstrate an improved trade-off between running time and retrieval accuracy, controlled by the score threshold used for short word matches. For example, the running time can be reduced by 20-30% while achieving ROC (receiver operator characteristic) scores similar to those obtained with current default parameters.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom