Adversarial attacks on text classification models using layer‐wise relevance propagation | Zendy

Xu Jincheng | Zendy; Du Qingfeng | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Adversarial attacks on text classification models using layer‐wise relevance propagation

Author(s) -

Xu Jincheng,

Du Qingfeng

Publication year - 2020

Publication title -

international journal of intelligent systems

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.291

H-Index - 87

eISSN - 1098-111X

pISSN - 0884-8173

DOI - 10.1002/int.22260

Subject(s) - interpretability , computer science , adversarial system , relevance (law) , artificial intelligence , machine learning , scalability , deep learning , feature (linguistics) , transparency (behavior) , deep neural networks , linguistics , philosophy , computer security , database , political science , law

Due to the nested nonlinear structure inside neural networks, most existing deep learning models are treated as black boxes, and they are highly vulnerable to adversarial attacks. On the one hand, adversarial examples shed light on the decision‐making process of these opaque models to interrogate the interpretability. On the other hand, interpretability can be used as a powerful tool to assist in the generation of adversarial examples by affording transparency on the relative contribution of each input feature to the final prediction. Recently, a post‐hoc explanatory method, layer‐wise relevance propagation (LRP), shows significant value in instance‐wise explanations. In this paper, we attempt to optimize the recently proposed explanation‐based attack algorithms (EAAs) on text classification models with LRP. We empirically show that LRP provides good explanations and benefits existing EAAs notably. Apart from that, we propose a LRP‐based simple but effective EAA, LRPTricker . LRPTricker uses LRP to identify important words and subsequently performs typo‐based perturbations on these words to generate the adversarial texts. The extensive experiments show that LRPTricker is able to reduce the performance of text classification models significantly with infinitesimal perturbations as well as lead to high scalability.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research