Measuring the Difficulty of Distance-Based Indexing
Author(s) -
Matthew Skala
Publication year - 2005
Publication title -
lecture notes in computer science
Language(s) - English
Resource type - Book series
SCImago Journal Rank - 0.249
H-Index - 400
eISSN - 1611-3349
pISSN - 0302-9743
ISBN - 3-540-29740-5
DOI - 10.1007/11575832_12
Subject(s) - search engine indexing , curse of dimensionality , computer science , vector space , similarity (geometry) , statistic , nearest neighbor search , vector space model , data mining , data structure , pattern recognition (psychology) , artificial intelligence , mathematics , statistics , geometry , image (mathematics) , programming language
Data structures for similarity search are commonly evaluated on data in vector spaces, but distance-based data structures are also applicable to non-vector spaces with no natural concept of dimensionality. The intrinsic dimensionality statistic of Chávez and Navarro provides a way to compare the performance of similarity indexing and search algorithms across different spaces, and predict the performance of index data structures on non-vector spaces by relating them to equivalent vector spaces. We characterise its asymptotic behaviour, and give experimental results to calibrate these comparisons.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom