On the cross-population generalizability of gene expression prediction models | Zendy

Kevin L. Keys | Zendy; Angel C. Y. Mak | Zendy; Marquitta J. White | Zendy; Walter L. Eckalbar | Zendy; Andrew Dahl | Zendy; Joel Mefford | Zendy; Anna V. Mikhaylova | Zendy; María G. Contreras | Zendy; Jennifer R. Elhawary | Zendy; Celeste Eng | Zendy; Donglei Hu | Zendy; Scott Huntsman | Zendy; Sam S. Oh | Zendy; Sandra Salazar | Zendy; Michael A. LeNoir | Zendy; Jimmie C. Ye | Zendy; Timothy A. Thornton | Zendy; Noah Zaitlen | Zendy; Esteban G. Burchard | Zendy; Christopher R. Gignoux | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

On the cross-population generalizability of gene expression prediction models

Author(s) -

Kevin L. Keys,

Angel C. Y. Mak,

Marquitta J. White,

Walter L. Eckalbar,

Andrew Dahl,

Joel Mefford,

Anna V. Mikhaylova,

María G. Contreras,

Jennifer R. Elhawary,

Celeste Eng,

Donglei Hu,

Scott Huntsman,

Sam S. Oh,

Sandra Salazar,

Michael A. LeNoir,

Jimmie C. Ye,

Timothy A. Thornton,

Noah Zaitlen,

Esteban G. Burchard,

Christopher R. Gignoux

Publication year - 2020

Publication title -

plos genetics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 3.587

H-Index - 233

eISSN - 1553-7404

pISSN - 1553-7390

DOI - 10.1371/journal.pgen.1008927

Subject(s) - generalizability theory , biology , population , computational biology , expression quantitative trait loci , rna seq , transcriptome , genetic architecture , replicate , gene , gene expression , genetics , evolutionary biology , single nucleotide polymorphism , genotype , quantitative trait locus , statistics , demography , mathematics , sociology

The genetic control of gene expression is a core component of human physiology. For the past several years, transcriptome-wide association studies have leveraged large datasets of linked genotype and RNA sequencing information to create a powerful gene-based test of association that has been used in dozens of studies. While numerous discoveries have been made, the populations in the training data are overwhelmingly of European descent, and little is known about the generalizability of these models to other populations. Here, we test for cross-population generalizability of gene expression prediction models using a dataset of African American individuals with RNA-Seq data in whole blood. We find that the default models trained in large datasets such as GTEx and DGN fare poorly in African Americans, with a notable reduction in prediction accuracy when compared to European Americans. We replicate these limitations in cross-population generalizability using the five populations in the GEUVADIS dataset. Via realistic simulations of both populations and gene expression, we show that accurate cross-population generalizability of transcriptome prediction only arises when eQTL architecture is substantially shared across populations. In contrast, models with non-identical eQTLs showed patterns similar to real-world data. Therefore, generating RNA-Seq data in diverse populations is a critical step towards multi-ethnic utility of gene expression prediction.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research