z-logo
open-access-imgOpen Access
Samawa Language Part of Speech Tagging with Probabilistic Approach: Comparison of Unigram, HMM and TnT Models
Author(s) -
Trienani Hariyanti,
Saori Aida,
Hiroyuki Kameda
Publication year - 2019
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1235/1/012013
Subject(s) - hidden markov model , trigram , computer science , natural language processing , artificial intelligence , language model , punctuation , probabilistic logic , sentence , speech recognition
Samawa language is one of living languages with more than 500K native speakers in Sumbawa island, Indonesia. There are, however, extremely small amounts of available resources and efforts to develop tools in the Natural Language Processing (NLP) discipline area. In this paper, we observe and evaluate three models of probabilistic approach which are Unigram, Hidden Markov Model (HMM) and Trigram’n’Tags (TnT) models for part of speech tagging problem, which is a process to label either word or punctuation in a sentence. We used k-fold cross-validation (with k = 5 and 10) and tagged corpus around 20K tokens with 24 tags. TnT model gives the best performance reached 96.18% compared with the other models. This result shows that TnT model could be considered and used to extend Samawa corpora and help some NLP tasks in the future.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here