Comparison of Spelling Error Detection in Turkish Texts with Machine Learning and Transformer-Based Approaches | Zendy

Erturk Erdagi | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Comparison of Spelling Error Detection in Turkish Texts with Machine Learning and Transformer-Based Approaches

Author(s) -

Erturk Erdagi

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3619865

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

This study aims to compare the performance of five different models for spelling error detection, a crucial task in natural language processing. In this study, the performance of the Logistic Regression model, one of the traditional machine learning techniques, was investigated in both default and optimized hyperparameter forms. The LSTM structure, which provides good results on sequential data, the BERT model, which has a high ability to represent contextual language, and the Soft-Masked BERT model, designed specifically for the task of spelling error detection, were examined. All models were evaluated on a balanced Turkish dataset containing at least one spelling error based on the sentences in the dataset, and comparisons were made using basic classification metrics such as accuracy, precision, recall, F1-score, and ROC AUC. The findings show that the Logistic Regression model running with the default parameters failed to distinguish between classes with 0.366 accuracy and F1-score, while the best hyperparameter version of this model provided a significant improvement with 0.675 accuracy and F1-score. The LSTM model achieved partial success in learning sequential structure with 0.713 accuracy and 0.7 F1-score, outperforming the Logistic Regression model. The BERT model achieved 0.886 accuracy and 0.885 F1-score, outperforming both the Logistic Regression and LSTM models. The Soft-Masked BERT model achieved the highest success with 0.897 accuracy and F1-score. These results demonstrate that transformer-based models perform better on tasks involving both the morphological and contextual structure of the language. This study compares and demonstrates the effectiveness of various model architectures in detecting spelling errors in Turkish texts and highlights the contribution of context sensitivity to classification success.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research