Despite the remarkable abilities of Large Language Models (LLMs) to answer questions, they often display a considerable level of overconfidence even when the question does not have a definitive answer. To avoid providing hallucinated answers to these unknown questions, existing studies typically investigate approaches to refusing to answer these questions. In this work, we propose a novel and scalable self-alignment method to utilize the LLM itself to enhance its response-ability to different types of unknown questions, being capable of not only refusing to answer but also providing explanation to the unanswerability of unknown questions. Specifically, the Self-Align method first employ a two-stage class-aware self-augmentation approach to generate a large amount of unknown question-response data. Then we conduct disparity-driven self-curation to select qualified data for fine-tuning the LLM itself for aligning the responses to unknown questions as desired. Experimental results on two datasets across four types of unknown questions validate the superiority of the Self-Align method over existing baselines in terms of three types of task formulation.
翻译:尽管大语言模型(LLMs)在回答问题方面展现出卓越能力,但当问题没有确定答案时,它们往往表现出相当程度的过度自信。为了避免对这类未知问题给出幻觉式回答,现有研究通常探索拒绝回答此类问题的方法。本文提出一种新颖且可扩展的自对齐方法,利用LLM自身提升其对不同类型未知问题的应答能力——不仅能拒绝回答,还能对问题的不可回答性提供解释。具体而言,Self-Align方法首先采用两阶段类别感知自增强方法生成大量未知问题-响应数据,随后通过差异驱动的自筛选选取高质量数据微调LLM自身,使其对未知问题的响应符合期望。在涵盖四类未知问题的两个数据集上的实验结果表明,Self-Align方法在三种任务范式下均优于现有基线方法。