The Security Budget of Code-LLM Prompt Hardening: Provable Limits Under Pass-Only Acceptance

We give a quantitative impossibility result for pass-only prompt hardening of code LLMs. For any deterministic prompt filter $h$ and a registered family of finite executable-equivalence task variables $\mathcal Y_{\mathrm{exec}}$, the shared filtered-prompt channel $\rmI(h(p);h(\tilde p))$ is lower-bounded by a worst-$Y$ Fano floor; on HumanEval and MBPP the universal pass-only floor evaluates to $\mathcal F^{\mathrm{op}}\ge 0.84$ and $1.20$ nats at $η=0.05$ task-collapse tolerance, and the identity row realizes $\mathcal F^{\mathrm{id}}\ge 1.67$ and $1.80$ nats. An estimator-invariance corollary lifts the floor to any deterministic embedding pipeline; a dataset-agnostic corollary states the floor in visible-spec entropy and is empirically witnessed by $164/164$ HumanEval+ and $224/224$ MBPP+ $V(p)$-invariance. We operationalize the floor as the \emph{Tri-Audit Protocol}, a two-axis reporting protocol that separates a prompt-side deductive registry attribute (Shannon nats on the visible-spec representation) from a model-side empirical proxy (KSG-1 primary, MINE secondary, on hidden states). A constrained best-of-family search over deterministic and guarded learned filters on CodeLlama-7B, Qwen2.5-Coder-7B/1.5B and DeepSeek-Coder-6.7B at $n=164$ yields the \emph{Cross-Model Tri-Audit Invariance}: of twenty-eight pass-preserving rows, twelve antecedent-preserving deterministic rows fail proxy-axis leakage reduction on every backbone with sign-invariant positive deviations, twelve antecedent-changed-of-record learned-canonicalizer rows fail proxy-axis leakage on every backbone, and four antecedent-violating rows are reported as registered-family collapse; no filter produces a shared Tri-pass on a nine-cell gate-sensitivity sweep. Pass@1 alone cannot certify code-LLM prompt hardening.

翻译：我们给出了仅通过提示加固代码LLM的定量不可能性结果。对于任意确定性提示过滤器$h$和注册的有限可执行等价任务变量族$\mathcal Y_{\mathrm{exec}}$，共享过滤提示信道的互信息下界$\rmI(h(p);h(\tilde p))$由最坏情况$Y$的Fano下界决定；在HumanEval和MBPP上，当任务崩溃容差$η=0.05$时，通用仅通过下界评估为$\mathcal F^{\mathrm{op}}\ge 0.84$和$1.20$纳特，单位矩阵实现$\mathcal F^{\mathrm{id}}\ge 1.67$和$1.80$纳特。一个估计器不变性推论将该下界推广至任意确定性嵌入管线；一个数据集无关推论以可见规范熵形式表述下界，并通过$164/164$个HumanEval+和$224/224$个MBPP+案例的$V(p)$不变性实证验证。我们将该下界操作化为\emph{三审计协议}，一种双轴报告协议，将提示端演绎注册属性（可见规范表示的香农纳特）与模型端经验代理（基于隐藏状态的KSG-1主评估和MINE辅助评估）分离。在CodeLlama-7B、Qwen2.5-Coder-7B/1.5B和DeepSeek-Coder-6.7B上，对确定性保护和学习型保护过滤器进行$n=164$次约束最优族搜索，得到\emph{跨模型三审计不变性}：二十八个保留通过的行中，十二个保留前件的确定性行在每个骨干网络上均出现代理轴泄漏的正偏离，且符号不变；十二个更改前件记录的学习型规范化行在每个骨干网络上均出现代理轴泄漏；四个违反前件的行被报告为注册族崩溃；在九格门灵敏度扫描中，没有过滤器产生共享的三通过结果。仅通过@1无法证明代码LLM提示加固的安全性。