基于深度学习的黑盒大语言模型知识边界表达方法 (Deep Learning-based Method for Expressing Knowledge Boundary of Black-Box LLM)

Large Language Models (LLMs) have achieved remarkable success, however, the emergence of content generation distortion (hallucination) limits their practical applications. The core cause of hallucination lies in LLMs' lack of awareness regarding their stored internal knowledge, preventing them from expressing their knowledge state on questions beyond their internal knowledge boundaries, as humans do. However, existing research on knowledge boundary expression primarily focuses on white-box LLMs, leaving methods suitable for black-box LLMs which offer only API access without revealing internal parameters-largely unexplored. Against this backdrop, this paper proposes LSCL (LLM-Supervised Confidence Learning), a deep learning-based method for expressing the knowledge boundaries of black-box LLMs. Based on the knowledge distillation framework, this method designs a deep learning model. Taking the input question, output answer, and token probability from a black-box LLM as inputs, it constructs a mapping between the inputs and the model' internal knowledge state, enabling the quantification and expression of the black-box LLM' knowledge boundaries. Experiments conducted on diverse public datasets and with multiple prominent black-box LLMs demonstrate that LSCL effectively assists black-box LLMs in accurately expressing their knowledge boundaries. It significantly outperforms existing baseline models on metrics such as accuracy and recall rate. Furthermore, considering scenarios where some black-box LLMs do not support access to token probability, an adaptive alternative method is proposed. The performance of this alternative approach is close to that of LSCL and surpasses baseline models.

翻译：大语言模型（LLMs）已取得显著成功，然而内容生成失真（幻觉）现象的出现限制了其实际应用。幻觉的核心原因在于LLMs缺乏对其存储的内部知识的认知，使其无法像人类那样，在面对超出其内部知识边界的问题时表达自身的知识状态。然而，现有关于知识边界表达的研究主要集中于白盒LLMs，对于仅提供API访问而不暴露内部参数的黑盒LLMs，其适用的方法在很大程度上尚未得到探索。在此背景下，本文提出LSCL（LLM-Supervised Confidence Learning），一种基于深度学习的黑盒LLMs知识边界表达方法。该方法基于知识蒸馏框架，设计了一个深度学习模型。该模型以黑盒LLM的输入问题、输出答案及词元概率作为输入，构建输入与模型内部知识状态之间的映射关系，从而实现对黑盒LLM知识边界的量化与表达。在多种公开数据集及多个主流黑盒LLM上进行的实验表明，LSCL能有效辅助黑盒LLM准确表达其知识边界，在准确率、召回率等指标上显著优于现有基线模型。此外，考虑到部分黑盒LLM不支持访问词元概率的场景，本文提出了一种自适应替代方法。该替代方法的性能接近LSCL，并优于基线模型。