Standard Codon Substitution Models Overestimate Purifying Selection for Non-Stationary Data
Author(s) -
Benjamin D. Kaehler,
Von Bing Yap,
Gavin Huttley
Publication year - 2017
Publication title -
genome biology and evolution
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.702
H-Index - 74
ISSN - 1759-6653
DOI - 10.1093/gbe/evw308
Subject(s) - nonsynonymous substitution , biology , substitution (logic) , synonymous substitution , natural selection , negative selection , evolutionary biology , codon usage bias , molecular evolution , sequence (biology) , selection (genetic algorithm) , gc content , comparative genomics , model selection , genetics , computational biology , genomics , phylogenetics , gene , genome , statistics , mathematics , computer science , artificial intelligence , programming language
Estimation of natural selection on protein-coding sequences is a key comparative genomics approach for de novo prediction of lineage-specific adaptations. Selective pressure is measured on a per-gene basis by comparing the rate of nonsynonymous substitutions to the rate of synonymous substitutions. All published codon substitution models have been time-reversible and thus assume that sequence composition does not change over time. We previously demonstrated that if time-reversible DNA substitution models are applied in the presence of changing sequence composition, the number of substitutions is systematically biased towards overestimation. We extend these findings to the case of codon substitution models and further demonstrate that the ratio of nonsynonymous to synonymous rates of substitution tends to be underestimated over three data sets of mammals, vertebrates, and insects. Our basis for comparison is a nonstationary codon substitution model that allows sequence composition to change. Goodness-of-fit results demonstrate that our new model tends to fit the data better. Direct measurement of nonstationarity shows that bias in estimates of natural selection and genetic distance increases with the degree of violation of the stationarity assumption. Additionally, inferences drawn under time-reversible models are systematically affected by compositional divergence. As genomic sequences accumulate at an accelerating rate, the importance of accurate de novo estimation of natural selection increases. Our results establish that our new model provides a more robust perspective on this fundamental quantity.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom