As language models become increasingly integrated into our digital lives, Personalized Text Generation (PTG) has emerged as a pivotal component with a wide range of applications. However, the bias inherent in user written text, often used for PTG model training, can inadvertently associate different levels of linguistic quality with users' protected attributes. The model can inherit the bias and perpetuate inequality in generating text w.r.t. users' protected attributes, leading to unfair treatment when serving users. In this work, we investigate fairness of PTG in the context of personalized explanation generation for recommendations. We first discuss the biases in generated explanations and their fairness implications. To promote fairness, we introduce a general framework to achieve measure-specific counterfactual fairness in explanation generation. Extensive experiments and human evaluations demonstrate the effectiveness of our method.
翻译:随着语言模型日益融入我们的数字生活,个性化文本生成(PTG)已成为具有广泛应用场景的关键组成部分。然而,用户撰写的文本中固有的偏见(常被用于PTG模型训练)可能不经意地将不同语言质量与用户的受保护属性相关联。模型可能继承这种偏见,并在针对用户受保护属性生成文本时延续不平等性,从而导致服务用户时的不公平对待。本研究在推荐系统的个性化解释生成场景下探讨PTG的公平性问题。我们首先分析生成解释中的偏见及其公平性影响。为促进公平性,我们提出一个通用框架,在解释生成中实现特定度量的反事实公平。大量实验和人工评估证明了我们方法的有效性。