z-logo
open-access-imgOpen Access
Detecting and correcting real-word errors in Tamil sentences
Author(s) -
Ratnasingam Sakuntharaj,
Sinnathamby Mahesan
Publication year - 2018
Publication title -
ruhuna journal of science
Language(s) - English
Resource type - Journals
eISSN - 2536-8400
pISSN - 1800-279X
DOI - 10.4038/rjs.v9i2.43
Subject(s) - tamil , word (group theory) , natural language processing , computer science , artificial intelligence , speech recognition , linguistics , philosophy
Spell checkers concern two types of errors namely non-word errors and real-word errors. Non-word errors can be of two categories: First one is that the word itself is invalid; the other is that the word is valid but not present in a valid lexicon. Real-word error means the word is valid but inappropriate in the context of the sentence. An approach to correcting real-word errors in Tamil language is proposed in this paper. A bigram probability model is constructed to determine appropriateness of the valid word in the context of the sentence using a 3GB volume of corpora of Tamil text. In case of lacking appropriateness, the word is marked as a real-word error and minimum edit distance technique is used to find lexically similar words, and the appropriateness of such words is measured by a word-level n-gram language probability model. A hash table with word-length as the key is used to speed up the search for words to check for the lexical similarity. Words of lengths of m-1 to m+1 are considered with m being the length of the word found to be ‘inappropriate’. Test results show that the suggestions generated by the system are with more than 98% accuracy as approved by a Scholar in Tamil.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom