With natural language generation becoming a popular use case for language models, the Bias Benchmark for Question-Answering (BBQ) has grown to be an important benchmark format for evaluating stereotypical associations exhibited by generative models. We expand the linguistic scope of BBQ and construct FilBBQ through a four-phase development process consisting of template categorization, culturally aware translation, new template construction, and prompt generation. These processes resulted in a bias test composed of more than 10,000 prompts which assess whether models demonstrate sexist and homophobic prejudices relevant to the Philippine context. We then apply FilBBQ on models trained in Filipino but do so with a robust evaluation protocol that improves upon the reliability and accuracy of previous BBQ implementations. Specifically, we account for models' response instability by obtaining prompt responses across multiple seeds and averaging the bias scores calculated from these distinctly seeded runs. Our results confirm both the variability of bias scores across different seeds and the presence of sexist and homophobic biases relating to emotion, domesticity, stereotyped queer interests, and polygamy. FilBBQ is available via GitHub.
翻译:随着自然语言生成逐渐成为语言模型的流行应用场景,偏见问答基准(BBQ)已成为评估生成模型所展现刻板联想的重要基准范式。我们通过包含模板分类、文化感知翻译、新模板构建及提示生成的四阶段开发流程,拓展了BBQ的语言覆盖范围,构建出FilBBQ。该流程最终形成包含逾万条提示的偏见测试集,用于评估模型是否表现出与菲律宾社会语境相关的性别歧视与恐同偏见。我们将FilBBQ应用于经菲律宾语训练的模型,并采用改进先前BBQ实现可靠性与准确性的稳健评估方案:通过获取多随机种子下的提示响应,并对基于不同种子运行计算的偏见分数取平均值,以消除模型响应不稳定性带来的影响。实验结果既证实了不同随机种子下偏见分数的变异性,也揭示了模型在情绪表达、家庭属性、刻板化酷儿兴趣及一夫多妻制等方面存在的性别歧视与恐同偏见。FilBBQ已通过GitHub平台开源发布。