The ingrained principles of fairness in a dialogue system's decision-making process and generated responses are crucial for user engagement, satisfaction, and task achievement. Absence of equitable and inclusive principles can hinder the formation of common ground, which in turn negatively impacts the overall performance of the system. For example, misusing pronouns in a user interaction may cause ambiguity about the intended subject. Yet, there is no comprehensive study of equitable text generation in dialogue. Aptly, in this work, we use theories of computational learning to study this problem. We provide formal definitions of equity in text generation, and further, prove formal connections between learning human-likeness and learning equity: algorithms for improving equity ultimately reduce to algorithms for improving human-likeness (on augmented data). With this insight, we also formulate reasonable conditions under which text generation algorithms can learn to generate equitable text without any modifications to the biased training data on which they learn. To exemplify our theory in practice, we look at a group of algorithms for the GuessWhat?! visual dialogue game and, using this example, test our theory empirically. Our theory accurately predicts relative-performance of multiple algorithms in generating equitable text as measured by both human and automated evaluation.
翻译:对话系统在决策过程和生成回复中根植的公平原则对于用户参与度、满意度和任务完成度至关重要。缺乏公平和包容原则会阻碍共识的形成,进而对系统的整体性能产生负面影响。例如,在用户交互中误用代词可能导致预期指代对象产生歧义。然而,目前尚无关于对话中公平文本生成的全面研究。因此,本文运用计算学习理论来研究这一问题。我们提供了文本生成中公平性的形式化定义,并进一步证明了学习类人性和学习公平性之间形式化的联系:改进公平性的算法最终可归结为改进类人性(基于增强数据)的算法。基于这一发现,我们还提出了合理的条件,使得文本生成算法能够在无需修改所学习的有偏训练数据的情况下,学会生成公平文本。为在实践层面验证我们的理论,我们以GuessWhat?!视觉对话博弈中的一组算法为例,通过实证测试了该理论。我们提出的理论能够准确预测多个算法在生成公平文本方面的相对性能,这一结果在人工评估和自动评估中均得到了验证。