The rapid advancement of large language models (LLMs) and their growing integration into daily life underscore the importance of evaluating and ensuring their fairness. In this work, we examine fairness within the domain of emotional theory of mind, investigating whether LLMs exhibit gender biases when presented with a description of a person and their environment and asked, ''How does this person feel?''. Furthermore, we propose and evaluate several debiasing strategies, demonstrating that achieving meaningful reductions in bias requires training based interventions rather than relying solely on inference-time prompt-based approaches such as prompt engineering, etc.
翻译:大型语言模型(LLMs)的快速发展及其日益融入日常生活,凸显了评估和确保其公平性的重要性。本研究聚焦于心理理论情感认知领域的公平性问题,探究当LLMs接收到关于个人及其所处环境的描述并被询问"此人感受如何?"时,是否表现出性别偏见。此外,我们提出并评估了多种去偏见策略,结果表明:要实现偏见的实质性降低,需要基于训练过程的干预措施,而不能仅依赖推理阶段的提示工程等基于提示的方法。