A novel approach for high‐dimensional vector similarity join query | Zendy

Ma Youzhong | Zendy; Jia Shijie | Zendy; Zhang Yongxin | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

A novel approach for high‐dimensional vector similarity join query

Author(s) -

Ma Youzhong,

Jia Shijie,

Zhang Yongxin

Publication year - 2016

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.3952

Subject(s) - join (topology) , joins , computer science , nested loop join , scalability , speedup , similarity (geometry) , piecewise , theoretical computer science , aggregate (composite) , curse of dimensionality , data mining , algorithm , database , mathematics , artificial intelligence , parallel computing , combinatorics , image (mathematics) , mathematical analysis , materials science , composite material , programming language

SUMMARY In this paper, we mainly focus on similarity joins on massive high‐dimensional vectors, it is a costly operation because of curse of dimensionality. The main idea of our proposed solution is to design a novel filtering method that can filter out as many vector pairs as possible at relative low cost. By using the good dimension reduction ability of piecewise aggregate approximation and symbolic aggregate approximation techniques, we proposed three novel approaches to deal with high dimensional‐vector similarity join query: single‐PAA–based vector similarity join query, multi‐PAA–based vector similarity join query and SAX‐based vector similarity join query. We conducted comprehensive experiments to test the performance of the above approaches, we also test the speedup ratio compared with the naive method: block nested loop join. The experimental results show that our approaches have much better performance than that of block nested loop join and also have good scalability. To the best of our knowledge, this is the first work to try to deal with high‐dimensional vector similarity joins using piecewise aggregate approximation and symbolic aggregate approximation techniques, and the approaches proposed in this paper provide a new way to process the massive high‐dimensional vector data set.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research