Zero-knowledge proof (ZKP) provers remain costly because multi-scalar multiplication (MSM) and number-theoretic transforms (NTTs) dominate runtime as they need significant computation. AI ASICs such as TPUs provide massive matrix throughput and SotA energy efficiency. We present MORPH, the first framework that reformulates ZKP kernels to match AI-ASIC execution. We introduce Big-T complexity, a hardware-aware complexity model that exposes heterogeneous bottlenecks and layout-transformation costs ignored by Big-O. Guided by this analysis, (1) at arithmetic level, MORPH develops an MXU-centric extended-RNS lazy reduction that converts high-precision modular arithmetic into dense low-precision GEMMs, eliminating all carry chains, and (2) at dataflow level, MORPH constructs a unified-sharding layout-stationary TPU Pippenger MSM and optimized 3/5-step NTT that avoid on-TPU shuffles to minimize costly memory reorganization. Implemented in JAX, MORPH enables TPUv6e8 to achieve up-to 10x higher throughput on NTT and comparable throughput on MSM than GZKP. Our code: https://github.com/EfficientPPML/MORPH.
翻译:零知识证明(ZKP)证明器的计算开销仍然高昂,因为多标量乘法(MSM)和数论变换(NTT)在运行时占据主导地位,需要大量的计算资源。AI专用集成电路(如TPU)提供了巨大的矩阵吞吐量和先进的能效。本文提出MORPH——首个将ZKP核心计算重新映射至AI-ASIC执行框架。我们引入Big-T复杂度,这是一个硬件感知的复杂度模型,能够揭示异构瓶颈以及Big-O分析忽略的布局变换成本。在该模型指导下,(1)在算术层面,MORPH开发了基于MXU中心化的扩展RNS惰性归约方法,将高精度模运算转化为密集的低精度GEMM,消除了所有进位链;(2)在数据流层面,MORPH构建了统一分片布局驻留的TPU专用Pippenger MSM和优化的3/5步NTT,避免了TPU内部的数据混洗,从而最大限度地减少了代价高昂的内存重组。MORPH基于JAX实现,使TPUv6e8在NTT上实现高达10倍的吞吐量提升,并在MSM上达到与GZKP相当的吞吐量。我们的代码:https://github.com/EfficientPPML/MORPH。