Large Language Models (LLMs) are notorious for blending fact with fiction and generating non-factual content, known as hallucinations. To tackle this challenge, we propose an interactive system that helps users obtain insights into the reliability of the generated text. Our approach is based on the idea that the self-consistency of multiple samples generated by the same LLM relates to its confidence in individual claims in the generated texts. Using this idea, we design RELIC, an interactive system that enables users to investigate and verify semantic-level variations in multiple long-form responses. This allows users to recognize potentially inaccurate information in the generated text and make necessary corrections. From a user study with ten participants, we demonstrate that our approach helps users better verify the reliability of the generated text. We further summarize the design implications and lessons learned from this research for inspiring future studies on reliable human-LLM interactions.
翻译:论文摘要:大语言模型(LLMs)因其混淆事实与虚构内容并生成非事实性内容(即所谓的“幻觉”)而备受诟病。为应对这一挑战,我们提出了一种交互式系统,旨在帮助用户洞察生成文本的可靠性。该方法基于如下假设:同一大语言模型生成的多个样本之间的自一致性,与模型对生成文本中单个论断的置信度相关。基于这一思路,我们设计了交互式系统RELIC,使用户能够探究并验证多篇长文本响应中的语义层面差异,从而识别生成文本中潜在的不准确信息并进行必要修正。通过一项十名参与者参与的用户研究,我们证明该方法有助于用户更好地验证生成文本的可靠性。此外,我们总结了本次研究的设计启示与经验教训,以期为未来关于可靠人机交互的研究提供启发。