Research Library

open-access-imgOpen AccessBias Testing and Mitigation in LLM-based Code Generation
Author(s)
Dong Huang,
Qingwen Bu,
Jie Zhang,
Xiaofei Xie,
Junjie Chen,
Heming Cui
Publication year2024
Utilizing state-of-the-art Large Language Models (LLMs), automatic codegeneration models play a pivotal role in enhancing the productivity of softwaredevelopment procedures. As the adoption of LLMs becomes more widespread insoftware coding ecosystems, a pressing issue has emerged: does the generatedcode contain social bias and unfairness, such as those related to age, gender,and race? This issue concerns the integrity, fairness, and ethical foundationof software applications that depend on the code generated by these models, yetis under-explored in the literature. This paper presents a novel bias testingframework that is specifically designed for code generation tasks. Based onthis framework, we conduct an extensive evaluation of the bias in codegenerated by five state-of-the-art LLMs. Our findings reveal that 20.29% to44.93% code functions generated by the models under study are biased whenhandling bias sensitive tasks (i.e., tasks that involve sensitive attributessuch as age and gender). This indicates that the existing LLMs can be unfair incode generation, posing risks of unintended and harmful software behaviors. Tomitigate bias for code generation models, we evaluate five bias mitigationprompt strategies, i.e., utilizing bias testing results to refine the code(zero-shot), one-, few-shot, and two Chain-of-Thought (CoT) prompts. Ourevaluation results illustrate that these strategies are all effective inmitigating bias. Overall, one-shot and few-shot learning are the two mosteffective. For GPT-4, 80% to 90% code bias can be removed with one-shotlearning.
Language(s)English

Seeing content that should not be on Zendy? Contact us.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here