
A novel approach to compress dna repetative sequences in bio-informatics
Author(s) -
S Mohan Babu Chowdary,
Samparthi V S Kumar,
Deepak Nedunuri,
Vmnssvkr Gupta
Publication year - 2019
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1228/1/012026
Subject(s) - cytosine , genbank , dna , encode , set (abstract data type) , base pair , computer science , thymine , alphabet , guanine , sequence (biology) , computational biology , algorithm , theoretical computer science , nucleotide , biology , genetics , gene , programming language , linguistics , philosophy
In recent days numbers of gigabyte sequences of nucleotides are stored in a common database Genbank. All the victimization Deoxyribonucleic acid sequences for biological functions are to store the large number of Genomes in a compressed type in economically. Despite the fact that Deoxyribonucleic corrosive arrangements are put away in a packed kind, the information on Deoxyribonucleic corrosive groupings square measure hang on in science databases. For a four-letter alphabet in DNA (Adenine(A), Cytosine(C), Guanine(G) and Thymine(T)), an average description length of 2 bits per base is that the max length required to encode DNA. To reexamine the previous art of compression techniques and its merits and de merits, a novel attempt is initiated. Based on the comparative study of existing algorithms a new method proposed for DNA compression without depending on statistics of sequence set.