Premium
When a domain is not a domain, and why it is important to properly filter proteins in databases
Author(s) -
Towse ClareLouise,
Daggett Valerie
Publication year - 2012
Publication title -
bioessays
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.175
H-Index - 184
eISSN - 1521-1878
pISSN - 0265-9247
DOI - 10.1002/bies.201200116
Subject(s) - domain (mathematical analysis) , computer science , protein domain , database , feature (linguistics) , protein structure database , protein folding , computational biology , data mining , biology , mathematics , genetics , sequence database , gene , mathematical analysis , linguistics , philosophy , biochemistry
Membership in a protein domain database does not a domain make; a feature we realized when generating a consensus view of protein fold space with our consensus domain dictionary (CDD). This dictionary was used to select representative structures for characterization of the protein dynameome: the Dynameomics initiative. Through this endeavor we rejected a surprising 40% of the 1,695 folds in the CDD as being non‐autonomous folding units. Although some of this was due to the challenges of grouping similar fold topologies, the dissonance between the cataloguing and structural qualification of protein domains remains surprising. Another potential factor is previously overlooked intrinsic disorder; predictions suggest that 40% of proteins have either local or global disorder. One thing is clear, filtering a structural database and ensuring a consistent definition for protein domains is crucial, and caution is prescribed when generalizations of globular domains are drawn from unfiltered protein domain datasets.