Neural Uncertainty Principle: A Unified View of Adversarial Fragility and LLM Hallucination

Adversarial vulnerability in vision and hallucination in large language models are conventionally viewed as separate problems, each addressed with modality-specific patches. This study first reveals that they share a common geometric origin: the input and its loss gradient are conjugate observables subject to an irreducible uncertainty bound. Formalizing a Neural Uncertainty Principle (NUP) under a loss-induced state, we find that in near-bound regimes, further compression must be accompanied by increased sensitivity dispersion (adversarial fragility), while weak prompt-gradient coupling leaves generation under-constrained (hallucination). Crucially, this bound is modulated by an input-gradient correlation channel, captured by a specifically designed single-backward probe. In vision, masking highly coupled components improves robustness without costly adversarial training; in language, the same prefill-stage probe detects hallucination risk before generating any answer tokens. NUP thus turns two seemingly separate failure taxonomies into a shared uncertainty-budget view and provides a principled lens for reliability analysis. Guided by this NUP theory, we propose ConjMask (masking high-contribution input components) and LogitReg (logit-side regularization) to improve robustness without adversarial training, and use the probe as a decoding-free risk signal for LLMs, enabling hallucination detection and prompt selection. NUP thus provides a unified, practical framework for diagnosing and mitigating boundary anomalies across perception and generation tasks.

翻译：视觉领域的对抗脆弱性与大语言模型的幻觉通常被视为两个独立问题，各自采用模态特定的补丁方案加以解决。本研究首先揭示二者共享同一几何起源：输入及其损失梯度构成共轭可观测量，受制于不可约的不确定性下界。在损失诱导态下形式化神经不确定性原理（NUP）后，我们发现：在近边界区域，进一步压缩必然伴随着敏感度弥散度的增加（对抗脆弱性），而弱提示-梯度耦合则使生成过程约束不足（幻觉）。关键在于，该下界受输入-梯度相关性通道调制，可通过专门设计的单次反向传播探针进行捕获。在视觉领域，遮蔽高耦合分量可在不进行昂贵对抗训练的前提下提升鲁棒性；在语言领域，相同的预填充阶段探针可在生成任何答案词元前检测幻觉风险。因此，NUP将两种看似独立的失败分类学转化为共享的不确定性预算视角，并为可靠性分析提供了原理性透镜。基于NUP理论指导，我们提出ConjMask（遮蔽高贡献输入分量）与LogitReg（逻辑侧正则化）以在不采用对抗训练的条件下提升鲁棒性，并将该探针作为大语言模型的无解码风险信号，实现幻觉检测与提示选择。因此，NUP为跨感知与生成任务的边界异常诊断与缓解提供了统一且实用的框架。