Uncertainty quantification (UQ) in natural language generation (NLG) tasks remains an open challenge, exacerbated by the closed-source nature of the latest large language models (LLMs). This study investigates applying conformal prediction (CP), which can transform any heuristic uncertainty notion into rigorous prediction sets, to black-box LLMs in open-ended NLG tasks. We introduce a novel uncertainty measure based on self-consistency theory, and then develop a conformal uncertainty criterion by integrating the uncertainty condition aligned with correctness into the CP algorithm. Empirical evaluations indicate that our uncertainty measure outperforms prior state-of-the-art methods. Furthermore, we achieve strict control over the correctness coverage rate utilizing 7 popular LLMs on 4 free-form NLG datasets, spanning general-purpose and medical scenarios. Additionally, the calibrated prediction sets with small size further highlights the efficiency of our method in providing trustworthy guarantees for practical open-ended NLG applications.
翻译:自然语言生成任务中的不确定性量化仍然是一个开放挑战,而最新大语言模型的黑箱性质加剧了这一难题。本研究探讨将保形预测方法应用于黑箱大语言模型在开放式自然语言生成任务中的可行性,该方法可将任何启发式不确定性概念转化为严格的预测集。我们基于自洽理论提出了一种新颖的不确定性度量方法,进而通过将符合正确性的不确定性条件整合到保形预测算法中,开发出保形不确定性准则。实证评估表明,我们的不确定性度量方法优于现有最先进技术。此外,我们在4个涵盖通用场景和医疗场景的自由形式自然语言生成数据集上,利用7个主流大语言模型实现了对正确性覆盖率的严格控制。同时,经过校准的小规模预测集进一步凸显了本方法在为实际开放式自然语言生成应用提供可信保证方面的效率优势。