Despite showing increasingly human-like abilities, large language models (LLMs) often struggle with factual inaccuracies, i.e. "hallucinations", even when they hold relevant knowledge. To address these hallucinations, current approaches typically necessitate high-quality human factuality annotations. In this work, we explore Self-Alignment for Factuality, where we leverage the self-evaluation capability of an LLM to provide training signals that steer the model towards factuality. Specifically, we incorporate Self-Eval, a self-evaluation component, to prompt an LLM to validate the factuality of its own generated responses solely based on its internal knowledge. Additionally, we design Self-Knowledge Tuning (SK-Tuning) to augment the LLM's self-evaluation ability by improving the model's confidence estimation and calibration. We then utilize these self-annotated responses to fine-tune the model via Direct Preference Optimization algorithm. We show that the proposed self-alignment approach substantially enhances factual accuracy over Llama family models across three key knowledge-intensive tasks on TruthfulQA and BioGEN.
翻译:尽管大型语言模型(LLMs)展现出日益接近人类的能力,但即使具备相关知识,它们仍常出现事实性错误,即“幻觉”。为解决此类幻觉,现有方法通常需要高质量的人工事实性标注。本研究探索了面向事实性的自对齐方法,利用LLM的自我评估能力提供训练信号,引导模型趋向事实性。具体而言,我们引入自评估组件Self-Eval,促使LLM仅基于其内部知识验证自身生成回复的事实性。同时,我们设计了自我知识调优(SK-Tuning),通过改进模型的置信度估计与校准来增强其自评估能力。进而利用这些自标注回复,通过直接偏好优化算法对模型进行微调。实验表明,在TruthfulQA和BioGEN上的三项关键知识密集型任务中,所提出的自对齐方法显著提升了Llama系列模型的事实准确性。