Imitation Game for Adversarial Disillusion with Chain-of-Thought Role-Play in Generative AI | Zendy

Ching-Chun Chang | Zendy; Fan-Yun Chen | Zendy; Shih-Hong Gu | Zendy; Kai Gao | Zendy; Hanrui Wang | Zendy; Isao Echizen | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Imitation Game for Adversarial Disillusion with Chain-of-Thought Role-Play in Generative AI

Author(s) -

Ching-Chun Chang,

Fan-Yun Chen,

Shih-Hong Gu,

Kai Gao,

Hanrui Wang,

Isao Echizen

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3574016

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

As the cornerstone of artificial intelligence, machine perception confronts a fundamental threat posed by adversarial illusions. These adversarial attacks manifest in two primary forms: deductive illusion, where specific stimuli are crafted based on the victim model’s general decision logic, and inductive illusion, where the victim model’s general decision logic is shaped by specific stimuli. The former exploits the model’s decision boundaries to create a stimulus that, when applied, interferes with its decision-making process. The latter reinforces a conditioned reflex in the model, embedding a backdoor during its learning phase that, when triggered by a stimulus, causes aberrant behaviours. The multifaceted nature of adversarial illusions calls for a unified defence framework, addressing vulnerabilities across various forms of attack. In this study, we propose a disillusion paradigm based on the concept of an imitation game. At the heart of the imitation game lies a multimodal generative agent, steered by chain-of-thought reasoning, which observes, internalises and reconstructs the semantic essence of a sample, liberated from the classic pursuit of reversing the sample to its original state. As a proof of concept, we conduct experimental simulations using a multimodal generative dialogue agent and evaluates the methodology under a variety of attack scenarios. Experimental results demonstrate that the proposed framework consistently neutralises both deductive and inductive adversarial illusions. Across a range of white-box and black-box attack scenarios, the imitation-based method achieved classification accuracies between 94% and 97% on a subset of the ImageNet dataset, significantly outperforming benchmark defences such as JPEG compression and DiffPure.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search