Meta-Fair：基于AI辅助的大型语言模型公平性测试方法 (Meta-Fair: AI-Assisted Fairness Testing of Large Language Models)

Fairness--the absence of unjustified bias--is a core principle in the development of Artificial Intelligence (AI) systems, yet it remains difficult to assess and enforce. Current approaches to fairness testing in large language models (LLMs) often rely on manual evaluation, fixed templates, deterministic heuristics, and curated datasets, making them resource-intensive and difficult to scale. This work aims to lay the groundwork for a novel, automated method for testing fairness in LLMs, reducing the dependence on domain-specific resources and broadening the applicability of current approaches. Our approach, Meta-Fair, is based on two key ideas. First, we adopt metamorphic testing to uncover bias by examining how model outputs vary in response to controlled modifications of input prompts, defined by metamorphic relations (MRs). Second, we propose exploiting the potential of LLMs for both test case generation and output evaluation, leveraging their capability to generate diverse inputs and classify outputs effectively. The proposal is complemented by three open-source tools supporting LLM-driven generation, execution, and evaluation of test cases. We report the findings of several experiments involving 12 pre-trained LLMs, 14 MRs, 5 bias dimensions, and 7.9K automatically generated test cases. The results show that Meta-Fair is effective in uncovering bias in LLMs, achieving an average precision of 92% and revealing biased behaviour in 29% of executions. Additionally, LLMs prove to be reliable and consistent evaluators, with the best-performing models achieving F1-scores of up to 0.79. Although non-determinism affects consistency, these effects can be mitigated through careful MR design. While challenges remain to ensure broader applicability, the results indicate a promising path towards an unprecedented level of automation in LLM testing.

翻译：公平性——即不存在不合理偏见——是人工智能（AI）系统开发的核心原则，但其评估与执行仍面临困难。当前针对大型语言模型（LLM）的公平性测试方法通常依赖人工评估、固定模板、确定性启发式规则和精选数据集，导致资源消耗大且难以扩展。本研究旨在为一种新型自动化LLM公平性测试方法奠定基础，以降低对领域特定资源的依赖，并拓宽现有方法的适用范围。我们提出的Meta-Fair方法基于两个核心思想：首先，采用蜕变测试技术，通过分析模型输出在受控输入提示修改（由蜕变关系定义）下的变化来揭示偏见；其次，利用LLM在测试用例生成和输出评估方面的潜力，充分发挥其生成多样化输入和有效分类输出的能力。该方案辅以三个开源工具，支持LLM驱动的测试用例生成、执行与评估。我们报告了涉及12个预训练LLM、14条蜕变关系、5个偏见维度和7.9K个自动生成测试用例的多组实验结果。结果表明，Meta-Fair能有效揭示LLM中的偏见，平均精确率达92%，并在29%的执行中检测到偏见行为。此外，LLM被证明是可靠且一致的评估器，最佳模型的F1分数最高可达0.79。虽然非确定性会影响一致性，但通过精心设计蜕变关系可缓解此影响。尽管确保更广泛适用性仍存在挑战，但实验结果展现了实现LLM测试自动化新突破的可行路径。