An investigation of error sources and their impact in estimating the time to the most recent ancestor of spatially and temporally distributed HIV sequences | Zendy

Burr Tom L. | Zendy; Gattiker James R. | Zendy; Gerrish Philip J. | Zendy

Premium

An investigation of error sources and their impact in estimating the time to the most recent ancestor of spatially and temporally distributed HIV sequences

Author(s) -

Burr Tom L.,

Gattiker James R.,

Gerrish Philip J.

Publication year - 2003

Publication title -

statistics in medicine

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.996

H-Index - 183

eISSN - 1097-0258

pISSN - 0277-6715

DOI - 10.1002/sim.1508

Subject(s) - most recent common ancestor , confidence interval , range (aeronautics) , statistics , sequence (biology) , population , econometrics , coverage probability , computer science , mathematics , biology , phylogenetic tree , demography , genetics , materials science , gene , composite material , sociology

This is an investigation of significant error sources and their impact in estimating the time to the most recent common ancestor (MRCA) of spatially and temporally distributed human immunodeficiency virus (HIV) sequences. We simulate an HIV epidemic under a range of assumptions with known time to the MRCA (tMRCA). We then apply a range of baseline (known) evolutionary models to generate sequence data. We next estimate or assume one of several misspecified models and use the chosen model to estimate the time to the MRCA. Random effects and the extent of model misspecification determine the magnitude of error sources that could include: neglected heterogeneity in substitution rates across lineages and DNA sites; uncertainty in HIV isolation times; uncertain magnitude and type of population subdivision; uncertain impacts of host/viral transmission dynamics, and unavoidable model estimation errors. Our results suggest that confidence intervals will rarely have the nominal coverage probability for tMRCA. Neglected effects lead to errors that are unaccounted for in most analyses, resulting in optimistically narrow confidence intervals (CI). Using real HIV sequences having approximately known isolation times and locations, we present possible confidence intervals for several sets of assumptions. In general, we cannot be certain how much to broaden a stated confidence interval for tMRCA. However, we describe the impact of candidate error sources on CI width. We also determine which error sources have the most impact on CI width and demonstrate that the standard bootstrap method will underestimate the CI width. Copyright © 2003 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research