Neural networks increasingly run on hardware outside the user's control (cloud GPUs, inference marketplaces). Yet ML-as-a-Service reveals little about what actually ran or whether returned outputs faithfully reflect the intended inputs. Users lack recourse against service downgrades (model swaps, quantization, graph rewrites, or discrepancies like altered ad embeddings). Verifying outputs is hard because floating-point(FP) execution on heterogeneous accelerators is inherently nondeterministic. Existing approaches are either impractical for real FP neural networks or reintroduce vendor trust. We present TAO: a Tolerance Aware Optimistic verification protocol that accepts outputs within principled operator-level acceptance regions rather than requiring bitwise equality. TAO combines two error models: (i) sound per-operator IEEE-754 worst-case bounds and (ii) tight empirical percentile profiles calibrated across hardware. Discrepancies trigger a Merkle-anchored, threshold-guided dispute game that recursively partitions the computation graph until one operator remains, where adjudication reduces to a lightweight theoretical-bound check or a small honest-majority vote against empirical thresholds. Unchallenged results finalize after a challenge window, without requiring trusted hardware or deterministic kernels. We implement TAO as a PyTorch-compatible runtime and a contract layer currently deployed on Ethereum Holesky testnet. The runtime instruments graphs, computes per-operator bounds, and runs unmodified vendor kernels in FP32 with negligible overhead (0.3% on Qwen3-8B). Across CNNs, Transformers and diffusion models on A100, H100, RTX6000, RTX4090, empirical thresholds are $10^2-10^3$ times tighter than theoretical bounds, and bound-aware adversarial attacks achieve 0% success. Together, TAO reconciles scalability with verifiability for real-world heterogeneous ML compute.
翻译:神经网络日益在用户无法控制的硬件(云GPU、推理市场)上运行。然而,机器学习即服务(ML-as-a-Service)几乎不透露实际运行内容或返回输出是否真实反映预期输入。用户缺乏应对服务降级(模型替换、量化、图重写,或如广告嵌入向量被篡改等差异)的追索权。验证输出之所以困难,是因为异构加速器上的浮点(FP)执行本身具有非确定性。现有方法要么对实际浮点神经网络不可行,要么重新引入了对供应商的信任。我们提出TAO:一种容错感知(Tolerance Aware)的乐观验证协议,它在有原则的算子级接受区域内接受输出,而非要求逐位相等。TAO结合了两种误差模型:(i) 每个算子基于IEEE-754的可靠最坏情况边界,以及(ii) 跨硬件校准的紧凑经验百分位数分布。差异会触发一个基于Merkle锚定、阈值引导的争议博弈,递归划分计算图直至剩一个算子,此时裁决简化为轻量级理论边界检查或针对经验阈值的小型诚实多数投票。未经挑战的结果在挑战窗口结束后最终确定,无需可信硬件或确定性内核。我们将TAO实现为兼容PyTorch的运行时和当前部署于以太坊Holesky测试网的合约层。该运行时对图进行插桩、计算每个算子的边界,并在FP32上运行未经修改的供应商内核,开销可忽略不计(在Qwen3-8B上为0.3%)。在A100、H100、RTX6000、RTX4090上针对CNN、Transformer和扩散模型,经验阈值比理论边界紧凑$10^2-10^3$倍,且基于边界感知的对抗攻击成功率为0%。综上所述,TAO为现实世界的异构ML计算调和了可扩展性与可验证性。