Premium
Dipeptide frequency/bias analysis identifies conserved sites of nonrandomness shared by cysteine‐rich motifs
Author(s) -
Campion Stephen R.,
Ameen Abdullah S.,
Lai Longsheng,
King Jeniffer M.,
Munzenmaier Tracy N.
Publication year - 2001
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/prot.1097
Subject(s) - dipeptide , sequence motif , consensus sequence , peptide sequence , amino acid , cysteine , conserved sequence , biology , motif (music) , sequence analysis , structural motif , genetics , sequence (biology) , computational biology , biochemistry , enzyme , gene , physics , acoustics
This report describes the application of a simple computational tool, AAPAIR.TAB, for the systematic analysis of the cysteine‐rich EGF, Sushi, and Laminin motif/sequence families at the two‐amino acid level. Automated dipeptide frequency/bias analysis detects preferences in the distribution of amino acids in established protein families, by determining which “ordered dipeptides” occur most frequently in comprehensive motif‐specific sequence data sets. Graphic display of the dipeptide frequency/bias data revealed family‐specific preferences for certain dipeptides, but more importantly detected a shared preference for employment of the ordered dipeptides Gly–Tyr (GY) and Gly–Phe (GF) in all three protein families. The dipeptide Asn–Gly (NG) also exhibited high‐frequency and bias in the EGF and Sushi motif families, whereas Asn–Thr (NT) was distinguished in the Laminin family. Evaluation of the distribution of dipeptides identified by frequency/bias analysis subsequently revealed the highly restricted localization of the G(F/Y) and N(G/T) sequence elements at two separate sites of extreme conservation in the consensus sequence of all three sequence families. The similar employment of the high‐frequency/bias dipeptides in three distinct protein sequence families was further correlated with the concurrence of these shared molecular determinants at similar positions within the distinctive scaffolds of three structurally divergent, but similarly employed, motif modules. Proteins 2001;44:321–328. © 2001 Wiley‐Liss, Inc.