An Efficient Horizontal and Vertical Method for Online DNA Sequence Compression
Author(s) -
Kamta Nath Mishra,
Anupam Aaggarwal,
Edries Abdelhadi,
Prakash C. Srivastava
Publication year - 2010
Publication title -
international journal of computer applications
Language(s) - English
Resource type - Journals
ISSN - 0975-8887
DOI - 10.5120/757-954
Subject(s) - computer science , sequence (biology) , compression (physics) , information retrieval , algorithm , materials science , composite material , genetics , biology
DNA matching has become one of the most used biometric identification method during the last several years. DNA stores the information for creating and organizing an organism. It can be thought of as a string over the alphabets {A, C, G, T, N}, which makes four chemical components that make it up. Here, N represents an unknown nucleotide. This unknown nucleotide may be either A, or C, or G, or T. The size of each sequence is varying in the range of millions to billions of nucleotides. Compression of DNA is interesting for both practical reasons (such as reduced storage and transmission cost) and functional reasons (such as inferring structure and function from compression models). We present a new Lossless Compression algorithm; which compresses data first horizontally and then vertically. It is based on substitution and statistical methods. We claim that our algorithm achieves one of the best compression ratios for bench mark DNA sequences in comparison to other DNA sequence compression methods.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom