z-logo
open-access-imgOpen Access
Weakly Supervised Word Sense Disambiguation Using Automatically Labelled Collections
Author(s) -
Natalia Loukachevitch,
Natalia Loukachevitch
Publication year - 2021
Publication title -
trudy instituta sistemnogo programmirovaniâ ran/trudy instituta sistemnogo programmirovaniâ
Language(s) - English
Resource type - Journals
eISSN - 2220-6426
pISSN - 2079-8156
DOI - 10.15514/ispras-2021-33(6)-13
Subject(s) - bootstrapping (finance) , computer science , word sense disambiguation , bottleneck , artificial intelligence , word (group theory) , natural language processing , semeval , labeled data , machine learning , mathematics , wordnet , geometry , management , economics , econometrics , embedded system , task (project management)
State-of-the-art supervised word sense disambiguation models require large sense-tagged training sets. However, many low-resource languages, including Russian, lack such a large amount of data. To cope with the knowledge acquisition bottleneck in Russian, we first utilized the method based on the concept of monosemous relatives to automatically generate a labelled training collection. We then introduce three weakly supervised models trained on this synthetic data. Our work builds upon the bootstrapping approach: relying on this seed of tagged instances, the ensemble of the classifiers is used to label samples from unannotated corpora. Along with this method, different techniques were exploited to augment the new training examples. We show the simple bootstrapping approach based on the ensemble of weakly supervised models can already produce an improvement over the initial word sense disambiguation models.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here