Premium
Dual‐tree fast exact max‐kernel search
Author(s) -
Curtin Ryan R.,
Ram Parikshit
Publication year - 2014
Publication title -
statistical analysis and data mining: the asa data science journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.381
H-Index - 33
eISSN - 1932-1872
pISSN - 1932-1864
DOI - 10.1002/sam.11218
Subject(s) - combinatorics , kernel (algebra) , tree (set theory) , mathematics , algorithm
The problem of max‐kernel search arises everywhere: given a query point \documentclass{article}\usepackage{amsmath}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{amsfonts}\pagestyle{empty}\begin{document}$p_q$ \end{document} , a set of reference objects \documentclass{article}\usepackage{amsmath}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{amsfonts}\pagestyle{empty}\begin{document}$S_r$ \end{document} and some kernel \documentclass{article}\usepackage{amsmath}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{amsfonts}\pagestyle{empty}\begin{document}$\mathcal{K}$ \end{document} , find \documentclass{article}\usepackage{amsmath}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{amsfonts}\pagestyle{empty}\begin{document}$arg\,max_{p_r \in S_r} \mathcal{K}(p_q, p_r)$ \end{document} . Max‐kernel search is ubiquitous and appears in countless domains of science, thanks to the wide applicability of kernels. A few domains include image matching, information retrieval, bio‐informatics, similarity search, and collaborative filtering (to name just a few). However, there is no generalized technique for efficiently solving max‐kernel search. This paper presents a single‐tree algorithm called single‐tree FastMKS which returns the max‐kernel solution for a single query point in provably \documentclass{article}\usepackage{amsmath}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{amsfonts}\pagestyle{empty}\begin{document}$O(\log N)$ \end{document} time (where \documentclass{article}\usepackage{amsmath}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{amsfonts}\pagestyle{empty}\begin{document}$N$ \end{document} is the number of reference objects), and also a dual‐tree algorithm ( dual‐tree FastMKS ) which is useful for max‐kernel search with many query points. If the set of query points is of size \documentclass{article}\usepackage{amsmath}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{amsfonts}\pagestyle{empty}\begin{document}$O(N)$ \end{document} , this algorithm returns a solution in provably \documentclass{article}\usepackage{amsmath}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{amsfonts}\pagestyle{empty}\begin{document}$O(N)$ \end{document} time, which is significantly better than the \documentclass{article}\usepackage{amsmath}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{amsfonts}\pagestyle{empty}\begin{document}$O(N^2)$ \end{document} linear scan solution; these bounds are dependent on the expansion constant of the data. These algorithms work for abstract objects, as they do not require explicit representation of the points in kernel space. Empirical results for a variety of datasets show up to five orders of magnitude speedup in some cases. In addition, we present approximate extensions of the FastMKS algorithms that can achieve further speedups.