Most language models currently available are prone to self-contradiction during dialogues. To mitigate this issue, this study explores a novel contradictory dialogue processing task that aims to detect and modify contradictory statements in a conversation. This task is inspired by research on context faithfulness and dialogue comprehension, which have demonstrated that the detection and understanding of contradictions often necessitate detailed explanations. We develop a dataset comprising contradictory dialogues, in which one side of the conversation contradicts itself. Each dialogue is accompanied by an explanatory label that highlights the location and details of the contradiction. With this dataset, we present a Red Teaming framework for contradictory dialogue processing. The framework detects and attempts to explain the dialogue, then modifies the existing contradictory content using the explanation. Our experiments demonstrate that the framework improves the ability to detect contradictory dialogues and provides valid explanations. Additionally, it showcases distinct capabilities for modifying such dialogues. Our study highlights the importance of the logical inconsistency problem in conversational AI.
翻译:当前大多数语言模型在对话中容易出现自我矛盾。为缓解该问题,本研究探索了一项新型矛盾对话处理任务,旨在检测并修改对话中的矛盾表述。该任务受上下文忠实性与对话理解研究的启发,相关研究表明矛盾的检测与理解通常需要详细解释。我们构建了一个包含矛盾对话的数据集,其中对话一方存在自我矛盾。每个对话附带解释性标签,标注矛盾的位置与细节。基于该数据集,我们提出了一种用于矛盾对话处理的红队框架。该框架检测并尝试解释对话,随后利用解释内容修改现有矛盾。实验表明,该框架提升了矛盾对话的检测能力,提供了合理的解释,并展示了修改此类对话的独特能力。本研究凸显了对话AI中逻辑不一致问题的重要性。