Premium
Discovering structural correlations in α‐helices
Author(s) -
Klingler Tod M.,
Brutlag Douglas L.
Publication year - 1994
Publication title -
protein science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.353
H-Index - 175
eISSN - 1469-896X
pISSN - 0961-8368
DOI - 10.1002/pro.5560031024
Subject(s) - amino acid , sequence (biology) , protein structure , structural motif , computational biology , protein secondary structure , representation (politics) , sequence logo , probabilistic logic , conditional independence , computer science , peptide sequence , sequence alignment , algorithm , artificial intelligence , biology , genetics , biochemistry , politics , political science , gene , law
We have developed a new representation for structural and functional motifs in protein sequences based on correlations between pairs of amino acids and applied it to α‐helical and β‐sheet sequences. Existing probabilistic methods for representing and analyzing protein sequences have traditionally assumed conditional independence of evidence. In other words, amino acids are assumed to have no effect on each other. However, analyses of protein structures have repeatedly demonstrated the importance of interactions between amino acids in conferring both structure and function. Using Bayesian networks, we are able to model the relationships between amino acids at distinct positions in a protein sequence in addition to the amino acid distributions at each position. We have also developed an automated program for discovering sequence correlations using standard statistical tests and validation techniques. In this paper, we test this program on sequences from secondary structure motifs, namely α‐helices and β‐sheets. In each case, the correlations our program discovers correspond well with known physical and chemical interactions between amino acids in structures. Furthermore, we show that, using different chemical alphabets for the amino acids, we discover structural relationships based on the same chemical principle used in constructing the alphabet. This new representation of 3‐dimensional features in protein motifs, such as those arising from structural or functional constraints on the sequence, can be used to improve sequence analysis tools including pattern analysis and database search.