Statistics and Truth in Phylogenomics | Zendy

Sudhir Kumar | Zendy; Alan J Filipski | Zendy; Fabia U. Battistuzzi | Zendy; Sergei L. Kosakovsky Pond | Zendy; Koichiro Tamura | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Statistics and Truth in Phylogenomics

Author(s) -

Sudhir Kumar,

Alan J Filipski,

Fabia U. Battistuzzi,

Sergei L. Kosakovsky Pond,

Koichiro Tamura

Publication year - 2011

Publication title -

molecular biology and evolution

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 6.637

H-Index - 218

eISSN - 1537-1719

pISSN - 0737-4038

DOI - 10.1093/molbev/msr202

Subject(s) - phylogenomics , biology , inference , phylogenetic tree , evolutionary biology , phylogenetics , statistical inference , statistical hypothesis testing , robustness (evolution) , statistics , artificial intelligence , genetics , computer science , clade , mathematics , gene

Phylogenomics refers to the inference of historical relationships among species using genome-scale sequence data and to the use of phylogenetic analysis to infer protein function in multigene families. With rapidly decreasing sequencing costs, phylogenomics is becoming synonymous with evolutionary analysis of genome-scale and taxonomically densely sampled data sets. In phylogenetic inference applications, this translates into very large data sets that yield evolutionary and functional inferences with extremely small variances and high statistical confidence (P value). However, reports of highly significant P values are increasing even for contrasting phylogenetic hypotheses depending on the evolutionary model and inference method used, making it difficult to establish true relationships. We argue that the assessment of the robustness of results to biological factors, that may systematically mislead (bias) the outcomes of statistical estimation, will be a key to avoiding incorrect phylogenomic inferences. In fact, there is a need for increased emphasis on the magnitude of differences (effect sizes) in addition to the P values of the statistical test of the null hypothesis. On the other hand, the amount of sequence data available will likely always remain inadequate for some phylogenomic applications, for example, those involving episodic positive selection at individual codon positions and in specific lineages. Again, a focus on effect size and biological relevance, rather than the P value, may be warranted. Here, we present a theoretical overview and discuss practical aspects of the interplay between effect sizes, bias, and P values as it relates to the statistical inference of evolutionary truth in phylogenomics.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research