
The effect of genome graph expressiveness on the discrepancy between genome graph distance and string set distance
Author(s) -
Yutong Qiu,
Carl Kingsford
Publication year - 2022
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btac264
Subject(s) - edit distance , string (physics) , genome , graph traversal , mathematics , combinatorics , graph , string metric , discrete mathematics , computer science , algorithm , string searching algorithm , data structure , biology , genetics , gene , mathematical physics , programming language
Intra-sample heterogeneity describes the phenomenon where a genomic sample contains a diverse set of genomic sequences. In practice, the true string sets in a sample are often unknown due to limitations in sequencing technology. In order to compare heterogeneous samples, genome graphs can be used to represent such sets of strings. However, a genome graph is generally able to represent a string set universe that contains multiple sets of strings in addition to the true string set. This difference between genome graphs and string sets is not well characterized. As a result, a distance metric between genome graphs may not match the distance between true string sets.