Premium
Statistical potential‐based amino acid similarity matrices for aligning distantly related protein sequences
Author(s) -
Tan Yen Hock,
Huang He,
Kihara Daisuke
Publication year - 2006
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/prot.21020
Subject(s) - similarity (geometry) , protein structure prediction , structural alignment , context (archaeology) , amino acid , structural similarity , benchmark (surveying) , computational biology , template , sequence alignment , amino acid residue , protein structure , protein methods , computer science , bioinformatics , peptide sequence , biology , artificial intelligence , genetics , geography , biochemistry , paleontology , geodesy , gene , image (mathematics) , programming language
Aligning distantly related protein sequences is a long‐standing problem in bioinformatics, and a key for successful protein structure prediction. Its importance is increasing recently in the context of structural genomics projects because more and more experimentally solved structures are available as templates for protein structure modeling. Toward this end, recent structure prediction methods employ profile–profile alignments, and various ways of aligning two profiles have been developed. More fundamentally, a better amino acid similarity matrix can improve a profile itself; thereby resulting in more accurate profile–profile alignments. Here we have developed novel amino acid similarity matrices from knowledge‐based amino acid contact potentials. Contact potentials are used because the contact propensity to the other amino acids would be one of the most conserved features of each position of a protein structure. The derived amino acid similarity matrices are tested on benchmark alignments at three different levels, namely, the family, the superfamily, and the fold level. Compared to BLOSUM45 and the other existing matrices, the contact potential‐based matrices perform comparably in the family level alignments, but clearly outperform in the fold level alignments. The contact potential‐based matrices perform even better when suboptimal alignments are considered. Comparing the matrices themselves with each other revealed that the contact potential‐based matrices are very different from BLOSUM45 and the other matrices, indicating that they are located in a different basin in the amino acid similarity matrix space. Proteins 2006. © 2006 Wiley‐Liss, Inc.