Premium
Massively parallel rRNA gene sequencing exacerbates the potential for biased community diversity comparisons due to variable library sizes
Author(s) -
Gihring Thomas M.,
Green Stefan J.,
Schadt Christopher W.
Publication year - 2012
Publication title -
environmental microbiology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.954
H-Index - 188
eISSN - 1462-2920
pISSN - 1462-2912
DOI - 10.1111/j.1462-2920.2011.02550.x
Subject(s) - pyrosequencing , biology , estimator , ribosomal rna , computational biology , species richness , genetics , evolutionary biology , gene , sample (material) , sample size determination , statistics , ecology , chemistry , mathematics , chromatography
Summary Technologies for massively parallel sequencing are revolutionizing microbial ecology and are vastly increasing the scale of ribosomal RNA (rRNA) gene studies. Although pyrosequencing has increased the breadth and depth of possible rRNA gene sampling, one drawback is that the number of reads obtained per sample is difficult to control. Pyrosequencing libraries typically vary widely in the number of sequences per sample, even within individual studies, and there is a need to revisit the behaviour of richness estimators and diversity indices with variable gene sequence library sizes. Multiple reports and review papers have demonstrated the bias in non‐parametric richness estimators (e.g. Chao1 and ACE) and diversity indices when using clone libraries. However, we found that biased community comparisons are accumulating in the literature. Here we demonstrate the effects of sample size on Chao1, ACE, CatchAll, Shannon, Chao–Shen and Simpson's estimations specifically using pyrosequencing libraries. The need to equalize the number of reads being compared across libraries is reiterated, and investigators are directed towards available tools for making unbiased diversity comparisons.