CoMSA: compression of protein multiple sequence alignment files
Author(s) -
Sebastian Deorowicz,
Joanna Walczyszyn,
Agnieszka Debudaj-Grabysz
Publication year - 2018
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/bty619
Subject(s) - computer science , sequence (biology) , sequence alignment , compression (physics) , protein sequencing , computational biology , peptide sequence , biology , genetics , gene , materials science , composite material
Bioinformatics databases grow rapidly and achieve values hardly to imagine a decade ago. Among numerous bioinformatics processes generating hundreds of GB is multiple sequence alignments of protein families. Its largest database, i.e. Pfam, consumes 40-230 GB, depending of the variant. Storage and transfer of such massive data has become a challenge.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom