Large language models present challenges for principled uncertainty quantification, in part due to their complexity and the diversity of their outputs. Semantic dispersion, or the variance in the meaning of sampled answers, has been proposed as a useful proxy for model uncertainty, but the associated computational cost prohibits its use in latency-critical applications. We show that sampled semantic distributions can be distilled into lightweight student models which estimate a prompt-conditioned uncertainty before the language model generates an answer token. The student model predicts a semantic distribution over possible answers; the entropy of this distribution provides an effective uncertainty signal for hallucination prediction, and the probability density allows candidate answers to be evaluated for reliability. On TriviaQA, our student models match or outperform finite-sample semantic dispersion for hallucination prediction and provide a strong signal for out-of-domain answer detection. We term this technique Semantic Self-Distillation (SSD), which we suggest provides a general framework for distilling predictive uncertainty in complex output spaces beyond language.
翻译:大型语言模型为原则性不确定性量化带来了挑战,部分原因在于其复杂性和输出多样性。语义离散度,即采样答案含义的方差,已被提出作为模型不确定性的有效代理指标,但其相关计算成本阻碍了其在延迟敏感应用中的使用。我们证明,采样的语义分布可被蒸馏为轻量级学生模型,这些模型能在语言模型生成答案标记前估计提示条件不确定性。学生模型预测可能答案的语义分布;该分布的熵为幻觉预测提供了有效的不确定性信号,其概率密度则允许评估候选答案的可靠性。在TriviaQA数据集上,我们的学生模型在幻觉预测方面匹配或超越了有限样本语义离散度的性能,并为领域外答案检测提供了强信号。我们将此技术称为语义自蒸馏(SSD),我们认为这为超越语言领域的复杂输出空间中预测不确定性的蒸馏提供了一个通用框架。