Despite showing increasingly human-like abilities, large language models (LLMs) often struggle with factual inaccuracies, i.e. "hallucinations", even when they hold relevant knowledge. To address these hallucinations, current approaches typically necessitate high-quality human factuality annotations. In this work, we explore Self-Alignment for Factuality, where we leverage the self-evaluation capability of an LLM to provide training signals that steer the model towards factuality. Specifically, we incorporate Self-Eval, a self-evaluation component, to prompt an LLM to validate the factuality of its own generated responses solely based on its internal knowledge. Additionally, we design Self-Knowledge Tuning (SK-Tuning) to augment the LLM's self-evaluation ability by improving the model's confidence estimation and calibration. We then utilize these self-annotated responses to fine-tune the model via Direct Preference Optimization algorithm. We show that the proposed self-alignment approach substantially enhances factual accuracy over Llama family models across three key knowledge-intensive tasks on TruthfulQA and BioGEN.
翻译:尽管大语言模型展现出日益接近人类的能力,但它们常因事实不准确(即“幻觉”)而表现不佳,即便模型本身具备相关知识。为解决这些幻觉问题,当前方法通常需要高质量的人工事实性标注。在本研究中,我们探索了面向事实性的自对齐方法,利用大语言模型的自评估能力提供训练信号,引导模型向事实性方向优化。具体而言,我们引入自评估组件Self-Eval,促使模型仅基于其内部知识验证自身生成回答的事实性。此外,我们设计了自知识微调(SK-Tuning),通过提升模型的置信度估计与校准能力来增强其自评估能力。随后,我们利用这些自标注响应,通过直接偏好优化算法对模型进行微调。实验表明,所提出的自对齐方法在TruthfulQA和BioGEN三个关键知识密集型任务中,显著提升了Llama系列模型的事实准确性。