z-logo
Premium
Minor allele frequency thresholds strongly affect population structure inference with genomic data sets
Author(s) -
Linck Ethan,
Battey C. J.
Publication year - 2019
Publication title -
molecular ecology resources
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.96
H-Index - 136
eISSN - 1755-0998
pISSN - 1755-098X
DOI - 10.1111/1755-0998.12995
Subject(s) - biology , minor allele frequency , inference , allele frequency , population , genetics , population stratification , data set , allele , computational biology , statistics , single nucleotide polymorphism , mathematics , genotype , computer science , artificial intelligence , gene , demography , sociology
A common method of minimizing errors in large DNA sequence data sets is to drop variable sites with a minor allele frequency (MAF) below some specified threshold. Although widespread, this procedure has the potential to alter downstream population genetic inferences and has received relatively little rigorous analysis. Here we use simulations and an empirical single nucleotide polymorphism data set to demonstrate the impacts of MAF thresholds on inference of population structure—often the first step in analysis of population genomic data. We find that model‐based inference of population structure is confounded when singletons are included in the alignment, and that both model‐based and multivariate analyses infer less distinct clusters when more stringent MAF cutoffs are applied. We propose that this behaviour is caused by the combination of a drop in the total size of the data matrix and by correlations between allele frequencies and mutational age. We recommend a set of best practices for applying MAF filters in studies seeking to describe population structure with genomic data.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here