z-logo
open-access-imgOpen Access
Sparse, Dense, and Attentional Representations for Text Retrieval
Author(s) -
Yi Luan,
Jacob Eisenstein,
Kristina Toutanova,
Michael Collins
Publication year - 2021
Publication title -
transactions of the association for computational linguistics
Language(s) - English
Resource type - Journals
ISSN - 2307-387X
DOI - 10.1162/tacl_a_00369
Subject(s) - computer science , encoding (memory) , encoder , margin (machine learning) , dimension (graph theory) , dual (grammatical number) , artificial intelligence , neural coding , document retrieval , information retrieval , pattern recognition (psychology) , machine learning , art , mathematics , literature , pure mathematics , operating system
Dual encoders perform retrieval by encoding documents and queries into dense low-dimensional vectors, scoring each document by its inner product with the query. We investigate the capacity of this architecture relative to sparse bag-of-words models and attentional neural networks. Using both theoretical and empirical analysis, we establish connections between the encoding dimension, the margin between gold and lower-ranked documents, and the document length, suggesting limitations in the capacity of fixed-length encodings to support precise retrieval of long documents. Building on these insights, we propose a simple neural model that combines the efficiency of dual encoders with some of the expressiveness of more costly attentional architectures, and explore sparse-dense hybrids to capitalize on the precision of sparse retrieval. These models outperform strong alternatives in large-scale retrieval.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom