Finding exemplars in dense data with affinity propagation on clusters of GPUs | Zendy

Kurdziel Marcin | Zendy; Boryczko Krzysztof | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Finding exemplars in dense data with affinity propagation on clusters of GPUs

Author(s) -

Kurdziel Marcin,

Boryczko Krzysztof

Publication year - 2012

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.2962

Subject(s) - computer science , memory footprint , parallel computing , affinity propagation , cluster analysis , similarity (geometry) , metric (unit) , set (abstract data type) , computation , floating point , decomposition , graphics processing unit , matrix (chemical analysis) , algorithm , artificial intelligence , image (mathematics) , correlation clustering , ecology , operations management , materials science , canopy clustering algorithm , composite material , economics , biology , programming language , operating system

SUMMARY This work presents an efficient implementation of affinity propagation (AP) on clusters of graphical processing units (GPUs). AP is a state‐of‐the‐art method for finding exemplars in data sets described by similarity matrices. It is typically employed in crisp clustering applications. However, when finding exemplars in an n ‐pattern data set with dense, non‐metric similarities, AP performs iterative processing of three n × n floating point matrices. One of them stores the similarities, and the other two store the values that will ultimately pinpoint the exemplars. For large similarity matrices, AP is therefore computationally expensive. Although matrix operations of AP are well suited for GPUs, its memory footprint limits the size of tasks that can be solved on one unit. We present, however, a decomposition scheme for AP that distributes the calculations over multiple GPUs, with low communication‐to‐computation ratio. Because of this favorable communication pattern, our implementation finds exemplars in large, dense similarity data efficiently, even when GPUs are connected by a slow network. Furthermore, by combining global device memory of multiple GPUs, it can solve problems that would not fit in a single unit. Copyright © 2012 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research