z-logo
open-access-imgOpen Access
Alignment-free Sequence Searching over Whole Genomes Using 3D random plot of Query DNA Sequences
Author(s) -
Dayoung Lee
Publication year - 2018
Publication title -
informatica
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.172
H-Index - 34
eISSN - 1854-3871
pISSN - 0350-5596
DOI - 10.31449/inf.v42i3.2276
Subject(s) - alignment free sequence analysis , computer science , sequence (biology) , edit distance , preprocessor , genome , similarity (geometry) , sequence alignment , algorithm , data mining , artificial intelligence , biology , image (mathematics) , genetics , gene , peptide sequence
Most genomic data studies are based on sequence comparisons and searches, and comparison models based on alignment algorithms are most commonly used. This method is very accurate, but it is useful when the query is short in kilobytes, because it requires the quadratic time and space complexity, O(n2) where n is the length of target and query sequences. With the development of Next Generation Sequencing techniques, researches on whole genome sequence data of megabyte size are being actively studied, and new comparison and search methods for large-scale sequence data are needed. We propose a new alignment-free sequence comparison and search method to overcome the limitations of the alignment-based model. In this graphical model, the sequence searching problem in DNA strings can be reduced to find some parts of geometric object within a relatively small-scale geometric space. When comparing similarity by modifying sequences of similar length, we can confirm that the comparison model is appropriate by accurately reflecting the degree of similarity. When searching the query sequence comparison model based on 200MB sized whole genome sequence, using the compressed coordinate information, it was able to search the 10MB sequences in 22s, which is a very reduced time compared to alignment. Although it is not possible to find the exact position of the base pair unit as in the alignment result, it is a model that can be used as a preprocessing process to quickly search a whole genome sequence of several hundred megabytes-size.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom