z-logo
open-access-imgOpen Access
Word Embedding based Approaches for Information Retrieval
Author(s) -
Dwaipayan Roy
Publication year - 2017
Publication title -
electronic workshops in computing
Language(s) - English
Resource type - Conference proceedings
ISSN - 1477-9358
DOI - 10.14236/ewic/fdia2017.9
Subject(s) - word2vec , computer science , word (group theory) , analogy , natural language processing , word embedding , embedding , artificial intelligence , similarity (geometry) , representation (politics) , information retrieval , linguistics , philosophy , politics , political science , law , image (mathematics)
The interest on using word embedding has expanded in various areas of text processing in recent years following the introduction of the word2vec model by Mikolov et al. (2013) and Pennington et al. (2014). The word embedding models use a large amount of text to create low dimensional representations of words capturing relationships between words without any external supervision. The resultant representation is shown to replicate many linguistic regularities such as, semantic similarity between terms, conceptual composition of terms, laws of analogy of terms. These features can be used for Information Retrieval (IR) where, the retrieval functions primarily depend on statistical co-occurrences. For keyword based retrieval systems, often there is the problem of vocabulary mismatch. For example, given the information need ‘vehicle industry in Germany’, relevant documents might not get retrieved due to the presence of ‘automobile’ in place of ‘vehicle’. Documents with ‘volkswagen’ might not get the suitable importance due to the vocabulary mismatch. Researches has been going on to overcome the problem of vocabulary mismatch problem utilising word embedding. In this paper, some of the approaches that use word embedding for better retrieval is presented. Empirical experiments have shown the positive contribution of the embedding informations in text retrievals. The rest of the paper is organized as follows. In Section 2, a baseline retrieval model, that uses word embedding, is presented. Following that, in Section 3, word embedding based query expansion methods are elaborated. The empirical evidence of superiority of both type of models over the state-of-the-art retrieval models are shown after the presentation of the corresponding models. The paper is concluded with some future directions of studies in Section 4. 2. BASELINE RETRIEVALS

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom