z-logo
Premium
Accounting for missing data in the estimation of contemporary genetic effective population size (N e )
Author(s) -
Peel D.,
Waples R. S.,
Macbeth G. M.,
Do C.,
Ovenden J. R.
Publication year - 2013
Publication title -
molecular ecology resources
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.96
H-Index - 136
eISSN - 1755-0998
pISSN - 1755-098X
DOI - 10.1111/1755-0998.12049
Subject(s) - missing data , sample size determination , estimator , statistics , imputation (statistics) , population , harmonic mean , best linear unbiased prediction , sampling (signal processing) , population size , sample (material) , biology , effective population size , mathematics , econometrics , computer science , selection (genetic algorithm) , genetic variation , genetics , artificial intelligence , chemistry , demography , filter (signal processing) , chromatography , sociology , computer vision , gene
Theoretical models are often applied to population genetic data sets without fully considering the effect of missing data. Researchers can deal with missing data by removing individuals that have failed to yield genotypes and/or by removing loci that have failed to yield allelic determinations, but despite their best efforts, most data sets still contain some missing data. As a consequence, realized sample size differs among loci, and this poses a problem for unbiased methods that must explicitly account for random sampling error. One commonly used solution for the calculation of contemporary effective population size ( N e ) is to calculate the effective sample size as an unweighted mean or harmonic mean across loci. This is not ideal because it fails to account for the fact that loci with different numbers of alleles have different information content. Here we consider this problem for genetic estimators of contemporary effective population size ( N e ). To evaluate bias and precision of several statistical approaches for dealing with missing data, we simulated populations with known N e and various degrees of missing data. Across all scenarios, one method of correcting for missing data (fixed‐inverse variance‐weighted harmonic mean) consistently performed the best for both single‐sample and two‐sample (temporal) methods of estimating N e and outperformed some methods currently in widespread use. The approach adopted here may be a starting point to adjust other population genetics methods that include per‐locus sample size components.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here