Selection of a representative set of structures from brookhaven protein data bank | Zendy

Boberg Jorma | Zendy; Salakoski Tapio | Zendy; Vihinen Mauno | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Selection of a representative set of structures from brookhaven protein data bank

Author(s) -

Boberg Jorma,

Salakoski Tapio,

Vihinen Mauno

Publication year - 1992

Publication title -

proteins: structure, function, and bioinformatics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.699

H-Index - 191

eISSN - 1097-0134

pISSN - 0887-3585

DOI - 10.1002/prot.340140212

Subject(s) - protein data bank , cluster analysis , pairwise comparison , similarity (geometry) , set (abstract data type) , structural similarity , sequence (biology) , structural alignment , selection (genetic algorithm) , sample (material) , protein secondary structure , computer science , data mining , protein structure database , data set , sequence alignment , protein structure , biology , artificial intelligence , sequence database , genetics , peptide sequence , physics , image (mathematics) , biochemistry , gene , programming language , thermodynamics

Reliable structural and statistical analyses of three dimensional protein structures should be based on unbiased data. The Protein Data Bank is highly redundant, containing several entries for identical or very similar sequences. A technique was developed for clustering the known structures based on their sequences and contents of α‐and β‐structures. First, sequences were aligned pairwise. A representative sample of sequences was then obtained by grouping similar sequences together, and selecting a typical representative from each group. The similarity significance threshold needed in the clustering method was found by analyzing similarities of random sequences. Because three dimensional structures for proteins of same structural class are generally more conserved than their sequences, the proteins were clustered also according to their contents of secondary structural elements. The results of these clusterings indicate conservation of α‐and β‐structures even when sequence similarity is relatively low. An unbiased sample of 103 high resolution structures, representing a wide variety of proteins, was chosen based on the suggestions made by the clustering algorithm. The proteins were divided into structural classes according to their contents and ratios of secondary structural elements. Previous classifications have suffered from subjectice view of secondary structures, whereas here the classification was based on backbone geometry. The concise view lead to reclassification of some structures. The representative set of structures facilitates unbiased analyses of relationships between protein sequence, function, and structure as well as of structural characteristics. © 1992 Wiley‐Liss, Inc.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research