Privacy-Aware Detection for Large Language Models Using a Hybrid BiLSTM-HMM Approach | Zendy

Maryam Abbasalizadeh | Zendy; Sashank Narain | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Privacy-Aware Detection for Large Language Models Using a Hybrid BiLSTM-HMM Approach

Author(s) -

Maryam Abbasalizadeh,

Sashank Narain

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3587988

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Large Language Models (LLMs) have transformed natural language processing, enabling applications such as conversational agents and machine translation. However, their deployment introduces significant privacy concerns, including the memorization and unintended disclosure of sensitive data. Existing privacy-preserving techniques—such as Differential Privacy and federated learning—struggle to balance data protection, model utility, and computational efficiency. To address these limitations, we propose a lightweight privacy-disclosure detection system that combines Bidirectional Long Short-Term Memory (BiLSTM) networks with Hidden Markov Models (HMM) using a novel modeling pipeline. Our approach employs the Predefined and Sensitive Labeling (PSL) technique, a generative labeling approach that extracts meaningful patterns from data. These patterns are then used to train a BiLSTM model capable of proactively identifying sensitive information in real-time user interactions with LLMs. As BiLSTMs do not provide the probability of private data, we design a HMM that estimates the probability of occurrence for this private data. Utilizing the Forward algorithm, our system quantifies privacy risks, enabling users to revise inputs prior to submission and thereby enhancing data privacy. Trained on synthetic data using PSL technique, the model achieves approximately 99.94% precision, recall, and F1-score, and successfully detects previously unseen sensitive information in synthetic datasets with ≈99.98% accuracy across 55,000 sentences. Additionally, the generated model trained on patterns derived from synthetic data, achieved ≈99.99% accuracy when evaluated on a real-world dataset across varying sentence structures, demonstrating strong generalizability in detecting sensitive information regardless of the data source. Importantly, the model provides real-time predictions with an average execution time of 35.46 milliseconds, satisfying the speed requirements for practical deployment. It also trains 45.5 times faster than a state-of-the-art framework, offering high computational efficiency without compromising accuracy.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research