z-logo
open-access-imgOpen Access
Beyond N-Grams: Enhancing String Kernels with Transformer-Guided Semantic Insights
Author(s) -
Nazar Zaki,
Reem Alderei,
Mahra Alketbi,
Alia Alkaabi,
Fatima Alneyadi,
Nadeen Zaki
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3576076
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
The rapid advancements in large language models (LLMs) have led to the generation of sophisticated AI-produced texts, posing significant challenges in distinguishing machine-generated content from authentic human writing. This study presents a novel hybrid framework that effectively integrates string kernel approaches with deep contextual embeddings from state-of-the-art transformers for robust AI-generated text detection. We propose and evaluate four innovative kernel-based methods namely Attention-Augmented Kernel, Error Pattern Analysis, Transformer-Guided N-gram Selection, and a Custom Kernel Function each uniquely capturing semantic and structural distinctions of text. Extensive experiments conducted on diverse datasets, featuring texts generated and enhanced by leading LLMs including GPT-3.5, GPT-4, DeepSeek, and Kimi, demonstrate superior performance of the proposed methods. Particularly, the Transformer-Guided N-gram Selection and the Custom Kernel Function consistently outperform baseline models, achieving near-perfect detection accuracy with significantly reduced computational complexity. Comprehensive hyperparameter optimization further solidifies our methods’ effectiveness and practical applicability. The publicly available datasets and robust empirical evaluations contribute valuable benchmarks for future research. This work sets a new standard in AI-text detection methodologies, enhancing reliability, efficiency, and scalability for real-world applications.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom