Computational pan-genomics: status, promises and challenges
Author(s) -
Tobias Marschall,
Manja Marz,
Thomas Abeel,
Louis J. Dijkstra,
Bas E. Dutilh,
Ali Ghaffaari,
Paul Kersey,
Wigard P. Kloosterman,
Veli Mäkinen,
Adam M. Novak,
Benedict Paten,
David Porubský,
Éric Rivals,
Can Alkan,
Jasmijn A. Baaijens,
Paul I. W. de Bakker,
Valentina Boeva,
Raoul J. P. Bonnal,
Francesca Chiaromonte,
Rayan Chikhi,
Francesca D. Ciccarelli,
Robin Cijvat,
Erwin Datema,
Cornelia M. van Duijn,
Evan E. Eichler,
Corinna Ernst,
Eleazar Eskin,
Erik Garrison,
Mohammed El-Kebir,
Gunnar W. Klau,
Jan O. Korbel,
Eric-Wubbo Lameijer,
Ben Langmead,
Marcel Martin,
Paul Medvedev,
John C. Mu,
Pieter B. Neerincx,
Klaasjan G. Ouwens,
Pierre Peterlongo,
Nadia Pisanti,
Sven Rahmann,
Benjamin J. Raphael,
Knut Reinert,
Dick de Ridder,
Jeroen de Ridder,
Matthias Schlesner,
Ole Schulz-Trieglaff,
Ashley D. Sanders,
Siavash Sheikhizadeh,
Carl Shneider,
Sandra Smit,
Daniel Valenzuela,
Jiayin Wang,
Lodewyk F.A. Wessels,
Ying Zhang,
Victor Guryev,
Fabio Vandin,
Kai Ye,
Alexander Schönhuth
Publication year - 2016
Publication title -
briefings in bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.204
H-Index - 113
eISSN - 1477-4054
pISSN - 1467-5463
DOI - 10.1093/bib/bbw089
Subject(s) - genomics , computer science , data science , computational genomics , genome , construct (python library) , homo sapiens , computational biology , encode , computational model , biology , artificial intelligence , genetics , gene , programming language , sociology , anthropology
Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom