Large language models have been widely adopted in natural language processing, yet they face the challenge of generating unreliable content. Recent works aim to reduce misinformation and hallucinations by resorting to attribution as a means to provide evidence (i.e., citations). However, current attribution methods usually focus on the retrieval stage and automatic evaluation that neglect mirroring the citation mechanisms in human scholarly writing to bolster credibility. In this paper, we address these challenges by modelling the attribution task as preference learning and introducing an Automatic Preference Optimization (APO) framework. First, we create a curated collection for post-training with 6,330 examples by collecting and filtering from existing datasets. Second, considering the high cost of labelling preference data, we further propose an automatic method to synthesize attribution preference data resulting in 95,263 pairs. Moreover, inspired by the human citation process, we further propose a progressive preference optimization method by leveraging fine-grained information. Extensive experiments on three datasets (i.e., ASQA, StrategyQA, and ELI5) demonstrate that APO achieves state-of-the-art citation F1 with higher answer quality.
翻译:大语言模型已在自然语言处理中得到广泛应用,但仍面临生成不可靠内容的挑战。近期研究旨在通过引入归因机制(即提供引用证据)来减少错误信息和幻觉。然而,当前归因方法通常聚焦于检索阶段和自动评估,未能模拟人类学术写作中的引用机制以增强可信度。本文通过将归因任务建模为偏好学习,并引入自动偏好优化(Automatic Preference Optimization, APO)框架来应对这些挑战。首先,我们从现有数据集中收集并筛选出6,330个训练样本,构建了用于后训练的精选数据集。其次,针对偏好数据标注成本高昂的问题,我们进一步提出自动合成归因偏好数据的方法,获得95,263个数据对。此外,受人类引用过程的启发,我们利用细粒度信息提出了渐进式偏好优化方法。在三个数据集(ASQA、StrategyQA和ELI5)上的大量实验表明,APO在实现更高答案质量的同时,取得了领先的引用F1分数。