Monte Carlo estimation of the number of possible protein folds: Effects of sampling bias and folds distributions | Zendy

Leonov Hadas | Zendy; Mitchell Joseph S.B. | Zendy; Arkin Isaiah T. | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Monte Carlo estimation of the number of possible protein folds: Effects of sampling bias and folds distributions

Author(s) -

Leonov Hadas,

Mitchell Joseph S.B.,

Arkin Isaiah T.

Publication year - 2003

Publication title -

proteins: structure, function, and bioinformatics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.699

H-Index - 191

eISSN - 1097-0134

pISSN - 0887-3585

DOI - 10.1002/prot.10336

Subject(s) - monte carlo method , statistical physics , statistics , sampling (signal processing) , mathematics , econometrics , computer science , physics , filter (signal processing) , computer vision

The estimation of the number of protein folds in nature is a matter of considerable interest. In this study, a Monte Carlo method employing the broken stick model is used to assign a given number of proteins into a given number of folds. Subsequently, random, integer, non‐repeating numbers are generated in order to simulate the process of fold discovery. With this conceptual framework at hand, the effects of two factors upon the fold identification process were investigated: (1) the nature of folds distributions and (2) preferential sampling bias of previously identified folds. Depending on the type of distribution, dividing 100,000 proteins into 1,000 folds resulted in 10–30% of the folds having 10 proteins or less per fold, approximately 10% of the folds having 10–20 proteins per fold, 31–45% having 20–100 proteins per fold, and >30% of the folds having more than 100 proteins per fold. After randomly sampling one tenth of the proteins, 68–96% of the folds were identified. These percentages depend both on folds distribution and biased/non‐biased sampling. Only upon increasing the sampling bias for previously identified folds to 1,000, did the model result in a reduction of the number of proteins identified by an order of magnitude (approximately 9%). Thus, assuming the structures of one tenth of the population of proteins in nature have been solved, the results of the Monte Carlo simulation are more consistent with recent lower estimates of the number of folds, ≤1,000. Any deviation from this estimate would reflect significant bias in the experimental sampling of protein structure, and/or substantially nonuniform folds distribution, manifested in a large number of single‐fold proteins. Proteins 2003;51:352–359. © 2003 Wiley‐Liss, Inc.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research