Decentralized inference provides a scalable and resilient paradigm for serving large language models (LLMs), enabling distributed resource utilization and reducing reliance on centralized providers. However, in a permissionless environment without trusted nodes, ensuring the correctness of model outputs remains a core challenge. We introduce VeriLLM, a publicly verifiable protocol for decentralized LLM inference that achieves security under a one-honest-verifier assumption while maintaining practical efficiency. VeriLLM combines lightweight empirical rerunning with cryptographic commitments, allowing verifiers to validate results at approximately 1% of the underlying inference cost. To prevent verification bottlenecks, we design an isomorphic inference-verification architecture that multiplexes both inference and verification roles across the same GPU workers. This design (i) improves GPU utilization and overall throughput, (ii) enlarges the effective validator set, enhancing robustness and liveness, and (iii) enforces task indistinguishability to prevent node-specific optimizations or selective behavior. Through theoretical analysis and system-level evaluation, we show that VeriLLM achieves reliable public verifiability with minimal overhead, offering a practical foundation for trustworthy and scalable decentralized LLM inference.
翻译:去中心化推理为服务大型语言模型(LLMs)提供了一种可扩展且鲁棒的范式,能够实现分布式资源利用并减少对中心化提供商的依赖。然而,在无许可且缺乏可信节点的环境中,确保模型输出正确性仍是一个核心挑战。本文提出VeriLLM,一种用于去中心化LLM推理的公开可验证协议,该协议在单一诚实验证者假设下实现安全性,同时保持实际运行效率。VeriLLM将轻量级经验性重运行与密码学承诺相结合,使验证者能以约1%的底层推理成本验证结果。为防止验证瓶颈,我们设计了同构推理-验证架构,在相同的GPU工作节点上复用推理与验证角色。该设计(i)提升了GPU利用率和整体吞吐量,(ii)扩大了有效验证者集合,增强了系统鲁棒性与活性,(iii)通过任务不可区分性防止节点特定的优化或选择性行为。通过理论分析与系统级评估,我们证明VeriLLM能以最小开销实现可靠的公开可验证性,为可信且可扩展的去中心化LLM推理提供了实用基础。