Premium
Practical limits of function prediction
Author(s) -
Devos Damien,
Valencia Alfonso
Publication year - 2000
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/1097-0134(20001001)41:1<98::aid-prot120>3.0.co;2-s
Subject(s) - protein function prediction , function (biology) , pairwise comparison , sequence (biology) , similarity (geometry) , identity (music) , limit (mathematics) , protein function , computational biology , protein sequencing , computer science , mathematics , pattern recognition (psychology) , artificial intelligence , biology , peptide sequence , genetics , image (mathematics) , physics , mathematical analysis , gene , acoustics
The widening gap between known protein sequences and their functions has led to the practice of assigning a potential function to a protein on the basis of sequence similarity to proteins whose function has been experimentally investigated. We present here a critical view of the theoretical and practical bases for this approach. The results obtained by analyzing a significant number of true sequence similarities, derived directly from structural alignments, point to the complexity of function prediction. Different aspects of protein function, including (i) enzymatic function classification, (ii) functional annotations in the form of key words, (iii) classes of cellular function, and (iv) conservation of binding sites can only be reliably transferred between similar sequences to a modest degree. The reason for this difficulty is a combination of the unavoidable database inaccuracies and the plasticity of protein function. In addition, analysis of the relationship between sequence and functional descriptions defines an empirical limit for pairwise‐based functional annotations, namely, the three first digits of the six numbers used as descriptors of protein folds in the FSSP database can be predicted at an average level as low as 7.5% sequence identity, two of the four EC digits at 15% identity, half of the SWISS‐PROT key words related to protein function would require 20% identity, and the prediction of half of the residues in the binding site can be made at the 30% sequence identity level. Proteins 2000;41:98–107. © 2000 Wiley‐Liss, Inc.