To trust the fluent generations of large language models (LLMs), humans must be able to verify their correctness against trusted, external sources. Recent efforts, such as providing citations via retrieved documents or post-hoc provenance, enhance verifiability but provide no guarantees on their correctness. To address these limitations, we tackle the verifiability goal with a different philosophy: trivializing the verification process by developing models that quote verbatim statements from trusted sources in their pre-training data. We propose Quote-Tuning, which demonstrates the feasibility of aligning models to quote. The core of Quote-Tuning is a fast membership inference function that efficiently verifies text against trusted corpora. We leverage this tool to design a reward function to quantify quotes in model responses, and curate datasets for preference learning. Experiments show that Quote-Tuning significantly increases verbatim quotes from high-quality documents by up to 130% relative to base models while maintaining response quality. Quote-Tuning is applicable in different tasks, generalizes to out-of-domain data and diverse model families, and provides additional benefits to truthfulness. Our method not only serves as a hassle-free method to increase quoting but also opens up avenues for improving LLM trustworthiness through better verifiability.
翻译:为了信任大型语言模型(LLM)流畅的生成内容,人类必须能够依据可信的外部来源验证其正确性。近期通过检索文档提供引用或事后溯源等方法增强了可验证性,但无法保证其正确性。为突破这些局限,我们以不同理念处理可验证性目标:通过开发能够逐字引用预训练数据中可信来源语句的模型,使验证过程变得简单。我们提出引用微调(Quote-Tuning)方法,证明了使模型学会引用的可行性。该方法的核心理念是采用快速成员推断函数,可高效验证文本与可信语料库的匹配关系。我们利用此工具设计奖励函数以量化模型响应中的引用,并构建用于偏好学习的数据集。实验表明,引用微调能使模型从高质量文档中逐字引用的比例相比基础模型提升高达130%,同时保持响应质量。该方法适用于不同任务,能泛化至领域外数据和多样化模型体系,并为提升真实性带来额外益处。我们的方法不仅为增加引用提供便捷途径,更为通过提升可验证性改进LLM可信度开辟了新方向。