We observe that current conversational language models often waver in their judgments when faced with follow-up questions, even if the original judgment was correct. This wavering presents a significant challenge for generating reliable responses and building user trust. To comprehensively assess this issue, we introduce a \textsc{Follow-up Questioning Mechanism} along with two metrics to quantify this inconsistency, confirming its widespread presence in current language models. To mitigate this issue, we explore various prompting strategies for closed-source models; moreover, we develop a training-based framework \textsc{Unwavering-FQ} that teaches language models to maintain their originally correct judgments through synthesized high-quality preference data. Our experimental results confirm the effectiveness of our framework and its ability to enhance the general capabilities of models.
翻译:我们观察到,当前的对话语言模型在面对后续追问时,即使原始判断正确,也常常出现判断摇摆。这种摇摆对生成可靠回答和建立用户信任构成了重大挑战。为全面评估该问题,我们引入了\textsc{后续追问机制}及两项量化指标来衡量这种不一致性,证实了其在当前语言模型中的普遍存在。为缓解此问题,我们探索了多种针对闭源模型的提示策略;此外,我们开发了一种基于训练的框架\textsc{Unwavering-FQ},该框架通过合成的高质量偏好数据,教导语言模型坚持其原本正确的判断。我们的实验结果证实了该框架的有效性及其提升模型通用能力的作用。