SANS: high-throughput retrieval of protein sequences allowing 50% mismatches
Author(s) -
Jouko Koskinen,
Liisa Holm
Publication year - 2012
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/bts417
Subject(s) - computer science , nearest neighbor search , sequence (biology) , filter (signal processing) , genome , annotation , similarity (geometry) , software , structural genomics , sequence alignment , information retrieval , computational biology , data mining , gene , biology , protein structure , genetics , artificial intelligence , peptide sequence , image (mathematics) , computer vision , programming language , biochemistry
The genomic era in molecular biology has brought on a rapidly widening gap between the amount of sequence data and first-hand experimental characterization of proteins. Fortunately, the theory of evolution provides a simple solution: functional and structural information can be transferred between homologous proteins. Sequence similarity searching followed by k-nearest neighbor classification is the most widely used tool to predict the function or structure of anonymous gene products that come out of genome sequencing projects.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom