z-logo
open-access-imgOpen Access
Developing Corpora using Wikipedia and Word2vec for Word Sense Disambiguation
Author(s) -
Farza Nurifan,
Riyanarto Sarno,
Cahyaningtyas Sekar Wahyuni
Publication year - 2018
Publication title -
indonesian journal of electrical engineering and computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.241
H-Index - 17
eISSN - 2502-4760
pISSN - 2502-4752
DOI - 10.11591/ijeecs.v12.i3.pp1239-1246
Subject(s) - computer science , natural language processing , word2vec , artificial intelligence , similarity (geometry) , word (group theory) , semantic similarity , cosine similarity , sentence , inference , distributional semantics , word sense disambiguation , meaning (existential) , information retrieval , wordnet , linguistics , pattern recognition (psychology) , psychology , philosophy , embedding , image (mathematics) , psychotherapist
Word Sense Disambiguation (WSD) is one of the most difficult problems in the artificial intelligence field or well known as AI-hard or AI-complete. A lot of problems can be solved using word sense disambiguation approaches like sentiment analysis, machine translation, search engine relevance, coherence, anaphora resolution, and inference. In this paper, we do research to solve WSD problem with two small corpora. We propose the use of Word2vec and Wikipedia to develop the corpora. After developing the corpora, we measure the sentence similarity with the corpora using cosine similarity to determine the meaning of the ambiguous word. Lastly, to improve accuracy, we use Lesk algorithms and Wu Palmer similarity to deal with problems when there is no word from a sentence in the corpora (we call it as semantic similarity). The results of our research show an 86.94% accuracy rate and the semantic similarity improve the accuracy rate by 12.96% in determining the meaning of ambiguous words.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here