A Multi-metric Algorithm for Hierarchical Clustering of Same-Length Protein Sequences
Author(s) -
Sotirios–Filippos Tsarouchis,
Maria Th. Kotouza,
Fotis Psomopoulos,
Pericles A. Mitkas
Publication year - 2018
Publication title -
ifip advances in information and communication technology
Language(s) - English
Resource type - Book series
SCImago Journal Rank - 0.189
H-Index - 53
eISSN - 1868-422X
pISSN - 1868-4238
DOI - 10.1007/978-3-319-92016-0_18
Subject(s) - cluster analysis , hierarchical clustering , identification (biology) , hierarchy , computer science , tracing , metric (unit) , computational biology , similarity (geometry) , algorithm , pattern recognition (psychology) , data mining , artificial intelligence , biology , image (mathematics) , engineering , operations management , botany , economics , market economy , operating system
The identification of meaningful groups of proteins has always been a major area of interest for structural and functional genomics. Successful protein clustering can lead to significant insight, assisting in both tracing the evolutionary history of the respective molecules as well as in identifying potential functions and interactions of novel sequences. Here we propose a clustering algorithm for same-length sequences, which allows the construction of subset hierarchy and facilitates the identification of the underlying patterns for any given subset. The proposed method utilizes the metrics of sequence identity and amino-acid similarity simultaneously as direct measures. The algorithm was applied on a real-world dataset consisting of clonotypic immunoglobulin (IG) sequences from Chronic lymphocytic leukemia (CLL) patients, showing promising results.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom