Premium
Key residues approach to the definition of protein families and analysis of sparse family signatures
Author(s) -
Ison Jon C.,
Blades Matthew J.,
Bleasby Alan J.,
Daniel Stephen C.,
Parish J. Howard,
Findlay John B.C.
Publication year - 2000
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/(sici)1097-0134(20000801)40:2<330::aid-prot120>3.0.co;2-3
Subject(s) - signature (topology) , protein family , computational biology , protein data bank , residue (chemistry) , motif (music) , structural motif , protein structure , protein sequencing , sequence motif , sequence alignment , protein data bank (rcsb pdb) , pattern recognition (psychology) , computer science , biology , data mining , peptide sequence , artificial intelligence , genetics , mathematics , biochemistry , physics , geometry , gene , acoustics , dna
We extend the concept of the motif as a tool for characterizing protein families and explore the feasibility of a sparse “motif” that is the length of the protein sequence itself. The type of motif discussed is a sparse family signature consisting of a set of N key residue positions (A1, A2…AN) preceded by gaps (G) thus G1A1G2A2. …GNAN. Both a residue and gap can be variable. A signature is matched to a protein sequence and scored using a dynamic programming algorithm which permits variability in gap distance and residue type. Generating a signature involves identifying residues associated with points of contact in interactions between secondary structure elements. A raw signature consists of a set of positions with potential key structural roles sampled from a sequence alignment constructed with reference to this contact data. Raw signatures are refined by sampling different gap‐residue pairs until the specificity of a signature for the family cannot be further improved. We summarize signatures for nine families of protein of diverse fold and function and present results of scans against the OWL protein sequence database. The implications of such signatures are discussed. Proteins 2000;40:330–341. © 2000 Wiley‐Liss, Inc.