z-logo
open-access-imgOpen Access
Comparison of Spelling Error Detection in Turkish Texts with Machine Learning and Transformer-Based Approaches
Author(s) -
Erturk Erdagi
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3619865
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
This study aims to compare the performance of five different models for spelling error detection, a crucial task in natural language processing. In this study, the performance of the Logistic Regression model, one of the traditional machine learning techniques, was investigated in both default and optimized hyperparameter forms. The LSTM structure, which provides good results on sequential data, the BERT model, which has a high ability to represent contextual language, and the Soft-Masked BERT model, designed specifically for the task of spelling error detection, were examined. All models were evaluated on a balanced Turkish dataset containing at least one spelling error based on the sentences in the dataset, and comparisons were made using basic classification metrics such as accuracy, precision, recall, F1-score, and ROC AUC. The findings show that the Logistic Regression model running with the default parameters failed to distinguish between classes with 0.366 accuracy and F1-score, while the best hyperparameter version of this model provided a significant improvement with 0.675 accuracy and F1-score. The LSTM model achieved partial success in learning sequential structure with 0.713 accuracy and 0.7 F1-score, outperforming the Logistic Regression model. The BERT model achieved 0.886 accuracy and 0.885 F1-score, outperforming both the Logistic Regression and LSTM models. The Soft-Masked BERT model achieved the highest success with 0.897 accuracy and F1-score. These results demonstrate that transformer-based models perform better on tasks involving both the morphological and contextual structure of the language. This study compares and demonstrates the effectiveness of various model architectures in detecting spelling errors in Turkish texts and highlights the contribution of context sensitivity to classification success.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom