Recent works have shown considerable improvements in task-oriented dialogue (TOD) systems by utilizing pretrained large language models (LLMs) in an end-to-end manner. However, the biased behavior of each component in a TOD system and the error propagation issue in the end-to-end framework can lead to seriously biased TOD responses. Existing works of fairness only focus on the total bias of a system. In this paper, we propose a diagnosis method to attribute bias to each component of a TOD system. With the proposed attribution method, we can gain a deeper understanding of the sources of bias. Additionally, researchers can mitigate biased model behavior at a more granular level. We conduct experiments to attribute the TOD system's bias toward three demographic axes: gender, age, and race. Experimental results show that the bias of a TOD system usually comes from the response generation model.
翻译:近期研究表明,通过端到端方式利用预训练大型语言模型(LLMs)在任务导向型对话(TOD)系统中取得了显著进展。然而,TOD系统中各组件的偏见行为以及端到端框架中的错误传播问题可能导致严重偏倚的TOD响应。现有关于公平性的研究仅关注系统的总体偏见。本文提出了一种诊断方法,用于将偏见归因于TOD系统的各组件。通过所提出的归因方法,我们能够更深入地理解偏见的来源。此外,研究者可以在更细粒度上缓解模型的偏倚行为。我们进行了实验,将TOD系统的偏见归因于性别、年龄和种族三个人口统计维度。实验结果表明,TOD系统的偏见通常来源于响应生成模型。