Large Language Models (LLMs) exhibit impressive capabilities yet suffer from sensitivity to slight input context variations, hampering reliability. Conventional metrics like accuracy and perplexity fail to assess local prediction robustness, as normalized output probabilities can obscure the underlying resilience of an LLM's internal state to perturbations. We introduce the Token Constraint Bound ($δ_{\mathrm{TCB}}$), a novel metric that quantifies the maximum internal state perturbation an LLM can withstand before its dominant next-token prediction significantly changes. Intrinsically linked to output embedding space geometry, $δ_{\mathrm{TCB}}$ provides insights into the stability of the model's internal predictive commitment. Our experiments show $δ_{\mathrm{TCB}}$ correlates with effective prompt engineering and uncovers critical prediction instabilities missed by perplexity during in-context learning and text generation. $δ_{\mathrm{TCB}}$ offers a principled, complementary approach to analyze and potentially improve the contextual stability of LLM predictions.
翻译:大型语言模型(LLM)展现出令人印象深刻的能力,但对输入语境细微变化的敏感性仍阻碍其可靠性。传统指标如准确率和困惑度无法评估局部预测的鲁棒性,因为归一化的输出概率可能掩盖LLM内部状态对扰动的内在韧性。我们提出了令牌约束边界($δ_{\mathrm{TCB}}$),这是一种新颖的度量标准,用于量化在主导的下一个令牌预测发生显著变化前,LLM内部状态所能承受的最大扰动。$δ_{\mathrm{TCB}}$与输出嵌入空间的几何结构内在关联,为模型内部预测承诺的稳定性提供了洞察。我们的实验表明,$δ_{\mathrm{TCB}}$与有效的提示工程相关,并在上下文学习和文本生成过程中,揭示了困惑度所遗漏的关键预测不稳定性。$δ_{\mathrm{TCB}}$为分析和潜在提升LLM预测的语境稳定性提供了一种原则性的互补方法。