Most language models currently available are prone to self-contradiction during dialogues. To mitigate this issue, this study explores a novel contradictory dialogue processing task that aims to detect and modify contradictory statements in a conversation. This task is inspired by research on context faithfulness and dialogue comprehension, which have demonstrated that the detection and understanding of contradictions often necessitate detailed explanations. We develop a dataset comprising contradictory dialogues, in which one side of the conversation contradicts itself. Each dialogue is accompanied by an explanatory label that highlights the location and details of the contradiction. With this dataset, we present a Red Teaming framework for contradictory dialogue processing. The framework detects and attempts to explain the dialogue, then modifies the existing contradictory content using the explanation. Our experiments demonstrate that the framework improves the ability to detect contradictory dialogues and provides valid explanations. Additionally, it showcases distinct capabilities for modifying such dialogues. Our study highlights the importance of the logical inconsistency problem in conversational AI.
翻译:目前大多数可用的语言模型在对话中容易出现自我矛盾。为缓解这一问题,本研究探索了一项新颖的矛盾对话处理任务,旨在检测并修改对话中的矛盾陈述。该任务受上下文忠实性和对话理解研究的启发,这些研究已表明,矛盾的检测与理解通常需要详细的解释。我们构建了一个包含矛盾对话的数据集,其中对话一方自相矛盾,每条对话都附带一个解释性标签,标明矛盾的位置和细节。利用该数据集,我们提出了一种用于矛盾对话处理的红队测试框架。该框架检测并尝试解释对话,随后利用解释修改现有的矛盾内容。实验表明,该框架提高了检测矛盾对话的能力,并提供了有效的解释。此外,它展示了修改此类对话的独特能力。我们的研究强调了对话人工智能中逻辑不一致性问题的重要性。