A linguistic features-based approach for the functional analysis of disinformation in Spanish | Zendy

Eduardo Puraivan | Zendy; Fabian Riquelme | Zendy; Rene Venegas | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

A linguistic features-based approach for the functional analysis of disinformation in Spanish

Author(s) -

Eduardo Puraivan,

Fabian Riquelme,

Rene Venegas

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3595750

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Information disorder has significant negative impacts on contemporary societies. This study presents a hybrid methodology that combines machine learning and natural language processing to analyze corpora of disinformation texts in Spanish. The approach not only adapts linguistic features originally developed for English to another major but less researched language, but also incorporates 251 features organized into six categories, surpassing previous methods in both the number and organization of features. Applied to the CLNews dataset of Spanish rumors, the analysis identified 17 features with statistically significant differences between false and real rumors. Linguistic analysis reveals that false rumors are characterized by more emotional language, greater sentence fragmentation, frequent use of auxiliary verbs, and lower information density, which creates an appearance of detail. Additionally, using BERT, a large language model (LLM), five topics were identified among false rumors, each exhibiting different strategies in terms of fragmentation, grammatical complexity, and information density. Given the above, linguistic features were employed to develop machine learning classifiers, with a linear SVM achieving 86% accuracy. This methodology offers a replicable framework for future research on disinformation and text analysis in Spanish, enhancing the interpretability of results. The methodology shows that classical machine learning models trained on carefully chosen linguistic features can deliver competitive results, surpassing BETO (57%) and RoBERTa-BNE (64%) in accuracy on the CLNews dataset. Moreover, these models demonstrate strong performance when the same features are applied to a different dataset and continue to perform well when the feature selection is adjusted to fit the new context.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research