Extremist Ideology Classification in Kazakh: A Multi-Class Approach Using Machine Learning and Psycholinguistic Analysis | Zendy

Shynar Mussiraliyeva | Zendy; Milana Bolatbek | Zendy; Kymbat Baisylbayeva | Zendy

Open Access

Extremist Ideology Classification in Kazakh: A Multi-Class Approach Using Machine Learning and Psycholinguistic Analysis

Author(s) -

Shynar Mussiraliyeva,

Milana Bolatbek,

Kymbat Baisylbayeva

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3596601

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

This paper presents a new approach to analyze extremist content in the Kazakh language on social media using advanced machine learning and natural language processing techniques. With the rapid growth of online data, especially social networks, tools are urgently needed to identify and classify extremist ideologies.We focus on four main categories (propaganda, recruitment, radicalization, and neutral content) and use a combination of traditional text vectorization methods, machine learning algorithms, and a special psycholinguistic analysis module (PLAM) adapted for Kazakh language use. Our method includes psycholinguistic analysis with PLAM-based features to improve classification accuracy and capture nuanced emotions from extremist texts. Experimental results demonstrate the effectiveness of our hybrid approach. The combination of CountVectorizer + Logistic Regression + PLAM achieved the highest performance among traditional models (F1-score: 0.9305, Accuracy: 0.9308, ROC AUC: 0.9892). Among deep learning models, the BERT + LSTM model yielded the best results (F1-score: 0.9481, Accuracy: 0.9485, ROC AUC: 0.9918), followed by the standalone BERT model (F1-score: 0.9412, Accuracy: 0.9414, ROC AUC: 0.9901). These findings confirm that combining contextual embeddings with sequential modeling enhances classification performance, particularly for ideologically complex categories. This research provides an effective framework for multilingual text analysis and contributes to enhanced monitoring and prevention of extremist content in underrepresented languages such as Kazakh. Future work will focus on refining these methods and exploring their application in other domains for robust content moderation and security in the digital space.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research