Bias in financial language models constitutes a major obstacle to their adoption in real-world applications. Detecting such bias is challenging, as it requires identifying inputs whose predictions change when varying properties unrelated to the decision, such as demographic attributes. Existing approaches typically rely on exhaustive mutation and pairwise prediction analysis over large corpora, which is effective but computationally expensive-particularly for large language models and can become impractical in continuous retraining and releasing processes. Aiming at reducing this cost, we conduct a large-scale study of bias in five financial language models, examining similarities in their bias tendencies across protected attributes and exploring cross-model-guided bias detection to identify bias-revealing inputs earlier. Our study uses approximately 17k real financial news sentences, mutated to construct over 125k original-mutant pairs. Results show that all models exhibit bias under both atomic (0.58\%-6.05\%) and intersectional (0.75\%-5.97\%) settings. Moreover, we observe consistent patterns in bias-revealing inputs across models, enabling substantial reuse and cost reduction in bias detection. For example, up to 73\% of FinMA's biased behaviours can be uncovered using only 20\% of the input pairs when guided by properties derived from DistilRoBERTa outputs.
翻译:金融语言模型中的偏见构成了其在现实世界应用中的主要障碍。检测此类偏见具有挑战性,因为它需要识别那些当改变与决策无关的属性(如人口统计属性)时预测会发生变化的输入。现有方法通常依赖于对大型语料库进行详尽的变异和成对预测分析,这种方法虽然有效但计算成本高昂——特别是对于大型语言模型而言,并且在持续重训练和发布过程中可能变得不切实际。为了降低这一成本,我们对五种金融语言模型中的偏见进行了大规模研究,考察了它们在受保护属性上偏见倾向的相似性,并探索了跨模型引导的偏见检测方法,以更早地识别揭示偏见的输入。我们的研究使用了约1.7万条真实的金融新闻句子,通过变异构建了超过12.5万条原始-变异对。结果显示,所有模型在原子(0.58\%–6.05\%)和交叉(0.75\%–5.97\%)设置下均表现出偏见。此外,我们观察到不同模型在揭示偏见的输入上存在一致的模式,这使得偏见检测能够实现大量复用并降低成本。例如,当使用从DistilRoBERTa输出中推导的属性进行引导时,仅需20\%的输入对即可揭示FinMA高达73\%的偏见行为。