Neural networks increasingly run on hardware outside the user's control (cloud GPUs, inference marketplaces). Yet ML-as-a-Service reveals little about what actually ran or whether returned outputs faithfully reflect the intended inputs. Users lack recourse against service downgrades (model swaps, quantization, graph rewrites, or discrepancies like altered ad embeddings). Verifying outputs is hard because floating-point(FP) execution on heterogeneous accelerators is inherently nondeterministic. Existing approaches are either impractical for real FP neural networks or reintroduce vendor trust. We present TAO: a Tolerance Aware Optimistic verification protocol that accepts outputs within principled operator-level acceptance regions rather than requiring bitwise equality. TAO combines two error models: (i) sound per-operator IEEE-754 worst-case bounds and (ii) tight empirical percentile profiles calibrated across hardware. Discrepancies trigger a Merkle-anchored, threshold-guided dispute game that recursively partitions the computation graph until one operator remains, where adjudication reduces to a lightweight theoretical-bound check or a small honest-majority vote against empirical thresholds. Unchallenged results finalize after a challenge window, without requiring trusted hardware or deterministic kernels. We implement TAO as a PyTorch-compatible runtime and a contract layer currently deployed on Ethereum Holesky testnet. The runtime instruments graphs, computes per-operator bounds, and runs unmodified vendor kernels in FP32 with negligible overhead (0.3% on Qwen3-8B). Across CNNs, Transformers and diffusion models on A100, H100, RTX6000, RTX4090, empirical thresholds are $10^2-10^3$ times tighter than theoretical bounds, and bound-aware adversarial attacks achieve 0% success. Together, TAO reconciles scalability with verifiability for real-world heterogeneous ML compute.
翻译:神经网络越来越多地在用户无法控制的硬件(云GPU、推理市场)上运行。然而,机器学习即服务几乎不透露实际运行内容或返回输出是否真实反映预期输入。面对服务降级(模型替换、量化、图重写或广告嵌入篡改等差异),用户缺乏追索手段。验证输出具有挑战性,因为异构加速器上的浮点计算本质上是非确定性的。现有方法要么对实际浮点神经网络不实用,要么重新引入供应商信任。我们提出TAO:一种容差感知乐观验证协议,它接受符合原则性算子级接受区域的输出,而非要求比特级相等。TAO结合两种误差模型:(i)可靠的逐算子IEEE-754最坏情况边界;(ii)跨硬件校准的严格经验百分位分布。差异会触发基于默克尔锚定和阈值引导的争议博弈,该博弈递归分割计算图直至剩余单个算子,此时裁决简化为轻量级理论边界检查或针对经验阈值的小规模诚实多数投票。未受挑战的结果在争议窗口期后最终确定,无需可信硬件或确定性内核。我们将TAO实现为PyTorch兼容的运行时和合约层,目前已部署在以太坊Holesky测试网。运行时对计算图进行插桩、计算逐算子边界,并以可忽略的开销(Qwen3-8B上为0.3%)在FP32精度下运行未经修改的供应商内核。在A100、H100、RTX6000、RTX4090上对CNN、Transformer和扩散模型的测试表明,经验阈值比理论边界严格$10^2-10^3$倍,基于边界感知的对抗攻击成功率为0%。TAO共同实现了现实世界异构机器学习计算的可扩展性与可验证性的统一。