APINT: A Full-Stack Framework for Acceleration of Privacy-Preserving Inference of Transformers based on Garbled Circuits

As the importance of Privacy-Preserving Inference of Transformers (PiT) increases, a hybrid protocol that integrates Garbled Circuits (GC) and Homomorphic Encryption (HE) is emerging for its implementation. While this protocol is preferred for its ability to maintain accuracy, it has a severe drawback of excessive latency. To address this, existing protocols primarily focused on reducing HE latency, thus making GC the new latency bottleneck. Furthermore, previous studies only focused on individual computing layers, such as protocol or hardware accelerator, lacking a comprehensive solution at the system level. This paper presents APINT, a full-stack framework designed to reduce PiT's overall latency by addressing the latency problem of GC through both software and hardware solutions. APINT features a novel protocol that reallocates possible GC workloads to alternative methods (i.e., HE or standard matrix operation), substantially decreasing the GC workload. It also suggests GC-friendly circuit generation that reduces the number of AND gates at the most, which is the expensive operator in GC. Furthermore, APINT proposes an innovative netlist scheduling that combines coarse-grained operation mapping and fine-grained scheduling for maximal data reuse and minimal dependency. Finally, APINT's hardware accelerator, combined with its compiler speculation, effectively resolves the memory stall issue. Putting it all together, APINT achieves a remarkable end-to-end reduction in latency, outperforming the existing protocol on CPU platform by 12.2x online and 2.2x offline. Meanwhile, the APINT accelerator not only reduces its latency by 3.3x but also saves energy consumption by 4.6x while operating PiT compared to the state-of-the-art GC accelerator.

翻译：随着Transformer隐私保护推理的重要性日益凸显，一种结合混淆电路与同态加密的混合协议正成为其实现的主流方案。尽管该协议因能保持计算精度而备受青睐，但其存在延迟过高的严重缺陷。现有研究主要致力于降低同态加密部分的延迟，使得混淆电路成为新的性能瓶颈。此外，先前工作仅关注协议层或硬件加速器等单一计算层面，缺乏系统级的整体解决方案。本文提出APINT——一个通过软硬件协同方案解决混淆电路延迟问题的全栈框架，旨在全面降低Transformer隐私保护推理的整体延迟。APINT提出了一种创新协议，可将部分混淆电路计算任务重新分配给替代方案（如同态加密或标准矩阵运算），从而显著减少混淆电路的工作负载。同时，框架设计了面向混淆电路的友好电路生成方案，最大限度减少其中代价最高的AND门数量。此外，APINT提出了结合粗粒度操作映射与细粒度调度的创新网表调度策略，以实现最大化数据复用和最小化计算依赖。最后，APINT的硬件加速器配合编译器推测机制，有效解决了内存停滞问题。综合实验表明，APINT在CPU平台上实现了显著的端到端延迟降低，在线推理速度较现有协议提升12.2倍，离线处理提升2.2倍。同时，与最先进的混淆电路加速器相比，APINT加速器在执行Transformer隐私保护推理时不仅将延迟降低3.3倍，还能减少4.6倍的能耗。