z-logo
Premium
Thousands of missing variants in the UK Biobank are recoverable by genome realignment
Author(s) -
Jia Tongqiu,
Munson Brenton,
Lango Allen Hana,
Ideker Trey,
Majithia Amit R.
Publication year - 2020
Publication title -
annals of human genetics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.537
H-Index - 77
eISSN - 1469-1809
pISSN - 0003-4800
DOI - 10.1111/ahg.12383
Subject(s) - biobank , exome , contig , heuristic , exome sequencing , set (abstract data type) , resource (disambiguation) , computer science , computational biology , human genome , genome , population , missing data , biology , data mining , genetics , gene , artificial intelligence , machine learning , mutation , medicine , environmental health , computer network , programming language
The UK Biobank is an unprecedented resource for human disease research. In March 2019, 49,997 exomes were made publicly available to investigators. Here we note that thousands of variant calls are unexpectedly absent from this dataset, with 641 genes showing zero variation. We show that the reason for this was an erroneous read alignment to the GRCh38 reference. The missing variants can be recovered by modifying read alignment parameters to correctly handle the expanded set of contigs available in the human genome reference. Given the size and complexity of such population scale datasets, we propose a simple heuristic that can uncover systematic errors using summary data accessible to most investigators.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here