Research Library

open-access-imgOpen AccessInstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models
Author(s)
Xunguang Wang,
Zhenlan Ji,
Pingchuan Ma,
Zongjie Li,
Shuai Wang
Publication year2024
Large vision-language models (LVLMs) have demonstrated their incrediblecapability in image understanding and response generation. However, this richvisual interaction also makes LVLMs vulnerable to adversarial examples. In thispaper, we formulate a novel and practical gray-box attack scenario that theadversary can only access the visual encoder of the victim LVLM, without theknowledge of its prompts (which are often proprietary for service providers andnot publicly available) and its underlying large language model (LLM). Thispractical setting poses challenges to the cross-prompt and cross-modeltransferability of targeted adversarial attack, which aims to confuse the LVLMto output a response that is semantically similar to the attacker's chosentarget text. To this end, we propose an instruction-tuned targeted attack(dubbed InstructTA) to deliver the targeted adversarial attack on LVLMs withhigh transferability. Initially, we utilize a public text-to-image generativemodel to "reverse" the target response into a target image, and employ GPT-4 toinfer a reasonable instruction $\boldsymbol{p}^\prime$ from the targetresponse. We then form a local surrogate model (sharing the same visual encoderwith the victim LVLM) to extract instruction-aware features of an adversarialimage example and the target image, and minimize the distance between these twofeatures to optimize the adversarial example. To further improve thetransferability, we augment the instruction $\boldsymbol{p}^\prime$ withinstructions paraphrased from an LLM. Extensive experiments demonstrate thesuperiority of our proposed method in targeted attack performance andtransferability.
Language(s)English

Seeing content that should not be on Zendy? Contact us.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here