emMAW: computing minimal absent words in external memory
Author(s) -
Alice Héliou,
Solon P. Pissis,
Simon J. Puglisi
Publication year - 2017
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btx209
Subject(s) - computer science , auxiliary memory , word (group theory) , byte , suffix , computation , sequence (biology) , software , process (computing) , genome , theoretical computer science , algorithm , programming language , biology , mathematics , genetics , linguistics , philosophy , geometry , gene , computer hardware
The biological significance of minimal absent words has been investigated in genomes of organisms from all domains of life. For instance, three minimal absent words of the human genome were found in Ebola virus genomes. There exists an O(n) -time and O(n) -space algorithm for computing all minimal absent words of a sequence of length n on a fixed-sized alphabet based on suffix arrays. A standard implementation of this algorithm, when applied to a large sequence of length n , requires more than 20 n bytes of RAM. Such memory requirements are a significant hurdle to the computation of minimal absent words in large datasets.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom