Pairwise similarity of TopSig document signatures
Author(s) -
Christopher M. De Vries,
Shlomo Geva
Publication year - 2012
Publication title -
qut eprints (queensland university of technology)
Language(s) - English
Resource type - Conference proceedings
DOI - 10.1145/2407085.2407103
Subject(s) - pairwise comparison , similarity (geometry) , computer science , neighbourhood (mathematics) , property (philosophy) , vector space model , vector space , information retrieval , artificial intelligence , data mining , pattern recognition (psychology) , mathematics , image (mathematics) , epistemology , mathematical analysis , philosophy , geometry
This paper analyses the pairwise distances of signatures produced by the TopSig retrieval model on two document collections. The distribution of the distances are compared to purely random signatures. It explains why TopSig is only competitive with state of the art retrieval models at early precision. Only the local neighbourhood of the signatures is interpretable. We suggest this is a common property of vector space models.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom