Efficient toolkit implementing best practices for principal component analysis of population genetic data | Zendy

Florian Privé | Zendy; Keurcien Luu | Zendy; Michaël G. B. Blum | Zendy; John J. McGrath | Zendy; Bjarni J. Vilhjálmsson | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Efficient toolkit implementing best practices for principal component analysis of population genetic data

Author(s) -

Florian Privé,

Keurcien Luu,

Michaël G. B. Blum,

John J. McGrath,

Bjarni J. Vilhjálmsson

Publication year - 2020

Publication title -

bioinformatics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 3.599

H-Index - 390

eISSN - 1367-4811

pISSN - 1367-4803

DOI - 10.1093/bioinformatics/btaa520

Subject(s) - principal component analysis , component (thermodynamics) , computer science , population , principal (computer security) , data mining , artificial intelligence , operating system , medicine , physics , environmental health , thermodynamics

Principal component analysis (PCA) of genetic data is routinely used to infer ancestry and control for population structure in various genetic analyses. However, conducting PCA analyses can be complicated and has several potential pitfalls. These pitfalls include (i) capturing linkage disequilibrium (LD) structure instead of population structure, (ii) projected PCs that suffer from shrinkage bias, (iii) detecting sample outliers and (iv) uneven population sizes. In this work, we explore these potential issues when using PCA, and present efficient solutions to these. Following applications to the UK Biobank and the 1000 Genomes project datasets, we make recommendations for best practices and provide efficient and user-friendly implementations of the proposed solutions in R packages bigsnpr and bigutilsr.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research