Premium
Finding exemplars in dense data with affinity propagation on clusters of GPUs
Author(s) -
Kurdziel Marcin,
Boryczko Krzysztof
Publication year - 2012
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.2962
Subject(s) - computer science , memory footprint , parallel computing , affinity propagation , cluster analysis , similarity (geometry) , metric (unit) , set (abstract data type) , computation , floating point , decomposition , graphics processing unit , matrix (chemical analysis) , algorithm , artificial intelligence , image (mathematics) , correlation clustering , ecology , operations management , materials science , canopy clustering algorithm , composite material , economics , biology , programming language , operating system
SUMMARY This work presents an efficient implementation of affinity propagation (AP) on clusters of graphical processing units (GPUs). AP is a state‐of‐the‐art method for finding exemplars in data sets described by similarity matrices. It is typically employed in crisp clustering applications. However, when finding exemplars in an n ‐pattern data set with dense, non‐metric similarities, AP performs iterative processing of three n × n floating point matrices. One of them stores the similarities, and the other two store the values that will ultimately pinpoint the exemplars. For large similarity matrices, AP is therefore computationally expensive. Although matrix operations of AP are well suited for GPUs, its memory footprint limits the size of tasks that can be solved on one unit. We present, however, a decomposition scheme for AP that distributes the calculations over multiple GPUs, with low communication‐to‐computation ratio. Because of this favorable communication pattern, our implementation finds exemplars in large, dense similarity data efficiently, even when GPUs are connected by a slow network. Furthermore, by combining global device memory of multiple GPUs, it can solve problems that would not fit in a single unit. Copyright © 2012 John Wiley & Sons, Ltd.