z-logo
open-access-imgOpen Access
Using cluster edge counting to aggregate iterations of centroid-linkage clustering results and avoid large distance matrices
Author(s) -
Matthew Kellom,
Jason Raymond
Publication year - 2017
Publication title -
journal of biological methods
Language(s) - English
Resource type - Journals
ISSN - 2326-9901
DOI - 10.14440/jbm.2017.153
Subject(s) - cluster analysis , centroid , computer science , data mining , linkage (software) , aggregate (composite) , bootstrapping (finance) , cluster (spacecraft) , single linkage clustering , pruning , enhanced data rates for gsm evolution , sequence (biology) , complete linkage , complete linkage clustering , hierarchical clustering , algorithm , correlation clustering , artificial intelligence , cure data clustering algorithm , mathematics , biochemistry , chemistry , materials science , genetics , biology , genotype , single nucleotide polymorphism , agronomy , composite material , econometrics , gene , programming language
Sequence clustering is a fundamental tool of molecular biology that is being challenged by increasing dataset sizes from high-throughput sequencing. The agglomerative algorithms that have been relied upon for their accuracy require the construction of computationally costly distance matrices which can overwhelm basic research personal computers. Alternative algorithms exist, such as centroid-linkage, to circumvent large memory requirements but their results are often input-order dependent. We present a method for bootstrapping the results of many centroid-linkage clustering iterations into an aggregate set of clusters, increasing cluster accuracy without a distance matrix. This method ranks cluster edges by conservation across iterations and reconstructs aggregate clusters from the resulting ranked edge list, pruning out low-frequency cluster edges that may have been a result of a specific sequence input order. Aggregating centroid-linkage clustering iterations can help researchers using basic research personal computers acquire more reliable clustering results without increasing memory resources.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom