z-logo
open-access-imgOpen Access
Effects of input data quantity on genome-wide association studies (GWAS)
Author(s) -
N.A. Yan,
Connor Burbridge,
Jinhong Shi,
Juxin Liu,
Anthony Kusalik
Publication year - 2019
Publication title -
international journal of data mining and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.214
H-Index - 21
eISSN - 1748-5681
pISSN - 1748-5673
DOI - 10.1504/ijdmb.2019.099286
Subject(s) - genome wide association study , single nucleotide polymorphism , genetic association , context (archaeology) , snp , statistics , correlation , computational biology , computer science , biology , data mining , genetics , mathematics , gene , genotype , paleontology , geometry
Many software packages have been developed for Genome-Wide Association Studies (GWAS) based on various statistical models. One key factor influencing the statistical reliability of GWAS is the amount of input data used. In this paper, we investigate how input data quantity influences output of four widely used GWAS programs, PLINK, TASSEL, GAPIT, and FaST-LMM, in the context of plant genomes and phenotypes. Both synthetic and real data are used. Evaluation is based on p- and q-values of output SNPs, and Kendall rank correlation between output SNP lists. Results show that for the same GWAS program, different Arabidopsis thaliana datasets demonstrate similar trends of rank correlation with varied input quantity, but differentiate on the numbers of SNPs passing a given p- or q-value threshold. We also show that variations in numbers of replicates influence the p-values of SNPs, but do not strongly affect the rank correlation.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom