Detecting stereotypes and biases in Large Language Models (LLMs) is crucial for enhancing fairness and reducing adverse impacts on individuals or groups when these models are applied. Traditional methods, which rely on embedding spaces or are based on probability metrics, fall short in revealing the nuanced and implicit biases present in various contexts. To address this challenge, we propose the FairMonitor framework and adopt a static-dynamic detection method for a comprehensive evaluation of stereotypes and biases in LLMs. The static component consists of a direct inquiry test, an implicit association test, and an unknown situation test, including 10,262 open-ended questions with 9 sensitive factors and 26 educational scenarios. And it is effective for evaluating both explicit and implicit biases. Moreover, we utilize the multi-agent system to construst the dynamic scenarios for detecting subtle biases in more complex and realistic setting. This component detects the biases based on the interaction behaviors of LLMs across 600 varied educational scenarios. The experimental results show that the cooperation of static and dynamic methods can detect more stereotypes and biased in LLMs.
翻译:检测大型语言模型(LLMs)中的刻板印象与偏见对于提升公平性、减少其应用时对个人或群体造成的不利影响至关重要。传统方法依赖嵌入空间或基于概率度量,难以揭示不同语境中细微且隐性的偏见。为解决这一挑战,我们提出了FairMonitor框架,采用静态-动态检测方法对LLMs中的刻板印象与偏见进行全面评估。静态组件包括直接询问测试、内隐联想测试和未知情境测试,包含10,262个开放式问题,涉及9个敏感因素和26个教育场景,可有效评估显性与隐性偏见。此外,我们利用多智能体系统构建动态场景,在更复杂和真实的环境中检测细微偏见。该组件基于LLMs在600个多样化教育场景中的交互行为进行偏见检测。实验结果表明,静态与动态方法的协同作用能检测出LLMs中更多的刻板印象与偏见。