An intelligent ubiquitous compression technique for DNA sequencing using Hadoop
Author(s) -
Sun Wenlin,
Sharma Ashutosh,
Asenso Evans
Publication year - 2022
Publication title -
the journal of engineering
Language(s) - English
Resource type - Journals
ISSN - 2051-3305
DOI - 10.1049/tje2.12193
Subject(s) - computer science , substring , data compression , sequence (biology) , compression (physics) , lossless compression , data mining , set (abstract data type) , algorithm , real time computing , materials science , biology , composite material , genetics , programming language
To solve the problem of reducing the amount of data storage in the practical application of massive biomedical data and efficiently using existing storage devices and bandwidth resources to store shared data. The proposed model includes both compression modes: the first is a single sequence compression mode designed for the characteristics of a large number of repeated substrings in DNA sequences; the second is a reference‐based multi‐sequence compression mode designed for the very similar characteristics of DNA sequences of different individuals of the same species. Both types of compression use the Lempel–Ziv–Welch (LZ) compression method, which is quite comparable to one another, to examine the sequence, as well as to study and classify the repetitive data that exists between a single sequence and numerous sequences in the sequence set. The proposed method aims to solve the problem of high pressure caused by single‐point processing of large sequence files, effectively reduces redundant information by using the local correlation of data, and effectively uses the computing resources of a cloud platform that is used for biological information processing to support the efficient storage, transmission, and sharing of data.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom