Large language models are increasingly deployed in settings where reliability matters, yet output-level uncertainty signals such as token probabilities, entropy, and self-consistency can become brittle under calibration--deployment mismatch. Conformal prediction provides finite-sample validity under exchangeability, but its practical usefulness depends on the quality of the nonconformity score. We propose a conformal framework for LLM question answering that uses internal representations rather than output-facing statistics: specifically, we introduce Layer-Wise Information (LI) scores, which measure how conditioning on the input reshapes predictive entropy across model depth, and use them as nonconformity scores within a standard split conformal pipeline. Across closed-ended and open-domain QA benchmarks, with the clearest gains under cross-domain shift, our method achieves a better validity--efficiency trade-off than strong text-level baselines while maintaining competitive in-domain reliability at the same nominal risk level. These results suggest that internal representations can provide more informative conformal scores when surface-level uncertainty is unstable under distribution shift.
翻译:大型语言模型越来越多地部署在可靠性至关重要的场景中,然而诸如令牌概率、熵和自一致性等输出层面的不确定性信号,在标定与部署失配的情况下会变得脆弱。保形预测在可交换性假设下提供了有限样本有效性,但其实际效用取决于非一致性分数的质量。我们提出了一种面向大语言模型问答任务的保形框架,该框架利用内部表征而非输出层面的统计数据:具体而言,我们引入了层间信息分数,用于衡量以输入为条件如何重塑模型各层预测熵,并将其作为标准分裂保形流程中的非一致性分数。在封闭式和开放式问答基准测试中,特别是在跨领域迁移场景下取得最显著增益时,我们的方法在保持相同名义风险水平下领域内可靠性的同时,相较于强文本层级基线方法实现了更优的有效性-效率权衡。这些结果表明,当表层不确定性在分布迁移下不稳定时,内部表征能够提供更具信息量的保形分数。