Despite demonstrating impressive capabilities, Large Language Models (LLMs) still often struggle to accurately express the factual knowledge they possess, especially in cases where the LLMs' knowledge boundaries are ambiguous. To improve LLMs' factual expressions, we propose the UAlign framework, which leverages Uncertainty estimations to represent knowledge boundaries, and then explicitly incorporates these representations as input features into prompts for LLMs to Align with factual knowledge. First, we prepare the dataset on knowledge question-answering (QA) samples by calculating two uncertainty estimations, including confidence score and semantic entropy, to represent the knowledge boundaries for LLMs. Subsequently, using the prepared dataset, we train a reward model that incorporates uncertainty estimations and then employ the Proximal Policy Optimization (PPO) algorithm for factuality alignment on LLMs. Experimental results indicate that, by integrating uncertainty representations in LLM alignment, the proposed UAlign can significantly enhance the LLMs' capacities to confidently answer known questions and refuse unknown questions on both in-domain and out-of-domain tasks, showing reliability improvements and good generalizability over various prompt- and training-based baselines.
翻译:尽管大语言模型(LLMs)展现出令人印象深刻的能力,但它们仍常常难以准确表达其掌握的事实性知识,尤其是在模型知识边界模糊的情况下。为提升LLMs的事实表达能力,我们提出UAlign框架,该框架利用不确定性估计来表征知识边界,并将这些表征作为输入特征显式整合至提示中,使LLMs能够与事实知识对齐。首先,我们通过计算置信度得分和语义熵两种不确定性估计来构建知识问答(QA)样本数据集,以此表征LLMs的知识边界。随后,基于该数据集训练一个融合不确定性估计的奖励模型,并采用近端策略优化(PPO)算法对LLMs进行事实性对齐。实验结果表明,通过在LLM对齐中整合不确定性表征,所提出的UAlign能显著增强LLMs在领域内和跨领域任务中自信回答已知问题并拒答未知问题的能力,在多种基于提示和基于训练的基线方法上均展现出可靠性提升和良好的泛化性能。