Dictionary-based Word Segmentation for Javanese | Zendy

Dipta Tanaya | Zendy; Mirna Adriani | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Dictionary-based Word Segmentation for Javanese

Author(s) -

Dipta Tanaya,

Mirna Adriani

Publication year - 2016

Publication title -

procedia computer science

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.334

H-Index - 76

ISSN - 1877-0509

DOI - 10.1016/j.procs.2016.04.051

Subject(s) - computer science , word (group theory) , natural language processing , text segmentation , artificial intelligence , word lists by frequency , character (mathematics) , segmentation , speech recognition , linguistics , mathematics , philosophy , geometry , sentence

Word segmentation is the first step to process language that written in non-Latin letters such as such as Javanese script. In this study, we report our work on word segmentation based on dictionary approach. In the first phase, we generate all possible segmented word series using a word dictionary. The correct word is selected based on the last character in a word, the last two characters in a word, the difference of two consecutive words, and the frequency of the word in the additional corpus. The experimental results show that identifying words using the frequency of words in the additional corpus yield the best accuracy that is 91.08%

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research