z-logo
open-access-imgOpen Access
Equiprobable discrete models of site-specific substitution rates underestimate the extent of rate variability
Author(s) -
Frank Mannino,
Sadie Wisotsky,
Sergei L. Kosakovsky Pond,
Spencer V. Muse
Publication year - 2020
Publication title -
plos one
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.99
H-Index - 332
ISSN - 1932-6203
DOI - 10.1371/journal.pone.0229493
Subject(s) - substitution (logic) , statistics , upper and lower bounds , discretization , mathematics , inference , distribution (mathematics) , variance (accounting) , sequence (biology) , econometrics , computer science , biology , mathematical analysis , economics , artificial intelligence , programming language , accounting , genetics
It is standard practice to model site-to-site variability of substitution rates by discretizing a continuous distribution into a small number, K , of equiprobable rate categories. We demonstrate that the variance of this discretized distribution has an upper bound determined solely by the choice of K and the mean of the distribution. This bound can introduce biases into statistical inference, especially when estimating parameters governing site-to-site variability of substitution rates. Applications to two large collections of sequence alignments demonstrate that this upper bound is often reached in analyses of real data. When parameter estimation is of primary interest, additional rate categories or more flexible modeling methods should be considered.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here