z-logo
Premium
Discovery of Recurrent Structural Motifs for Approximating Three‐Dimensional Protein Structures
Author(s) -
Soong TaTsen,
Hwang MingJing,
Chen ChungMing
Publication year - 2004
Publication title -
journal of the chinese chemical society
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.329
H-Index - 45
eISSN - 2192-6549
pISSN - 0009-4536
DOI - 10.1002/jccs.200400164
Subject(s) - pentamer , chemistry , algorithm , protein structure , centroid , structural motif , cluster analysis , partition (number theory) , threading (protein sequence) , cluster (spacecraft) , combinatorics , computer science , artificial intelligence , mathematics , biochemistry , programming language
The scope of conformation space that protein molecules can adopt is a problem of significant interest. Previous studies by other groups have shown that there are stereochemical constraints that confine local protein structures to a limited range of conformations. Furthermore, the results of many groups have demonstrated that the sequence‐to‐structure relationship remains detectable to some extent on a local level. By studying the conformational space of local protein structures, we may obtain more information concerning the constraints on local structural space and the sequence‐to‐structure mapping, hence facilitate ab initio structure prediction. In this study, we propose a novel algorithm that automatically discovers recurrent pentamer structures in proteins. The algorithm starts by applying Expectation‐Maximization (EM) clustering to the distances between non‐adjacent backbone C α atoms in a large set of pentamer fragments. A rough partition of the conformation space can thus be derived. In the second stage, by applying a split‐and‐merge algorithm, we can obtain a finite number of clusters and guarantee the homogeneity and distinctiveness of each one. Each cluster of protein structures is represented by a centroid structure. The results show that, with 40 major representative structures, we can approximate most of the protein fragments with an error of 0.378 Å. With only 20 types of structures, the fragment structures can still be modeled at 0.44 Å, which is comparable to or better than the performance of previous methods. We term the representatives “building blocks.” On the global level, we demonstrate that by concatenating different combinations of building blocks, we can model whole protein structures at high resolution: a resolution of 2.54 Å can be achieved simply by using 10 types of building blocks. This finding suggests that the study of molecular structures can be hugely simplified using this reduced representation.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here