Towards Verifiable AI with Lightweight Cryptographic Proofs of Inference

When large AI models are deployed as cloud-based services, clients have no guarantee that responses are correct or were produced by the intended model. Rerunning inference locally is infeasible for large models, and existing cryptographic proof systems -- while providing strong correctness guarantees -- introduce prohibitive prover overhead (e.g., hundreds of seconds per query for billion-parameter models). We present a verification framework and protocol that replaces full cryptographic proofs with a lightweight, sampling-based approach grounded in statistical properties of neural networks. We formalize the conditions under which trace separation between functionally dissimilar models can be leveraged to argue the security of verifiable inference protocols. The prover commits to the execution trace of inference via Merkle-tree-based vector commitments and opens only a small number of entries along randomly sampled paths from output to input. This yields a protocol that trades soundness for efficiency, a tradeoff well-suited to auditing, large-scale deployment settings where repeated queries amplify detection probability, and scenarios with rationally incentivized provers who face penalties upon detection. Our approach reduces proving times by several orders of magnitude compared to state-of-the-art cryptographic proof systems, going from the order of minutes to the order of milliseconds, with moderately larger proofs. Experiments on ResNet-18 classifiers and Llama-2-7B confirm that common architectures exhibit the statistical properties our protocol requires, and that natural adversarial strategies (gradient-descent reconstruction, inverse transforms, logit swapping) fail to produce traces that evade detection. We additionally present a protocol in the refereed delegation model, where two competing servers enable correct output identification in a logarithmic number of rounds.

翻译：当大型人工智能模型以云服务形式部署时，客户无法确保响应结果的正确性及是否由指定模型生成。对大型模型而言，在本地重复执行推理过程不可行，而现有加密证明系统虽能提供强正确性保证，却引入难以承受的证明者开销（例如，十亿参数模型每次查询需耗时数百秒）。我们提出一种验证框架与协议，该框架基于神经网络的统计特性，以轻量级采样方法取代完整的加密证明。我们形式化描述了在功能差异模型间利用迹线分离来论证可验证推理协议安全性的条件。证明者通过基于默克尔树的向量承诺提交推理执行迹线，仅沿从输出到输入的随机采样路径开放少量条目。该协议实现了可靠性与效率的折衷，特别适用于审计场景、大规模部署环境（重复查询可提升检测概率），以及面临惩罚风险的理性激励证明者场景。与现有最优加密证明系统相比，本方法将证明时间从分钟级缩短至毫秒级（仅适度增加证明体积），实现了数个数量级的性能提升。在ResNet-18分类器和Llama-2-7B上的实验证实，常见架构具备本协议所需的统计特性，且自然对抗策略（梯度下降重建、逆变换、逻辑值交换）未能生成可规避检测的迹线。我们还提出一种仲裁委托模型协议，在该模型中，两个竞争服务器可在对数轮次内实现正确输出识别。