Fully Homomorphic Encryption (FHE) enables privacy-preserving Transformer inference, but long-sequence encrypted Transformers quickly exceed single-GPU memory capacity because encoded weights are already large and encrypted activations grow rapidly with sequence length. Multi-GPU execution therefore becomes unavoidable, yet scaling remains challenging because communication is jointly induced by application-level aggregation and encryption-level RNS coupling. Existing approaches either synchronize between devices frequently or replicate encrypted tensors across devices, leading to excessive communication and latency. We present AEGIS, an Application-Encryption Guided Inference System for scalable long-sequence encrypted Transformer inference on multi-GPU platforms. AEGIS derives device placement from ciphertext dependencies jointly induced by Transformer dataflow and CKKS polynomial coupling, co-locating modulus-coherent and token-coherent data so that communication is introduced only when application dependencies require it, while reordering polynomial operators to overlap the remaining collectives with computation. On 2048-token inputs, AEGIS reduces inter-GPU communication by up to 57.9% in feed-forward networks and 81.3% in self-attention versus prior state-of-the-art designs. On four GPUs, it achieves up to 96.62% scaling efficiency, 3.86x end-to-end speedup, and 69.1% per-device memory reduction. These results establish coordinated application-encryption parallelism as a practical foundation for scalable homomorphic Transformer inference.
翻译:全同态加密(FHE)可实现隐私保护的Transformer推理,但由于编码权重本身庞大且加密激活值随序列长度快速增长,长序列加密Transformer会迅速超出单GPU内存容量。因此多GPU执行不可避免,但扩展仍具挑战性,因为通信由应用级聚合和加密级RNS耦合共同引发。现有方法要么在设备间频繁同步,要么跨设备复制加密张量,导致通信和延迟过高。本文提出AEGIS——一种面向多GPU平台的可扩展长序列加密Transformer推理的应用-加密协同推理系统。AEGIS根据Transformer数据流与CKKS多项式耦合共同引发的密文依赖关系推导设备布局,将模数一致和令牌一致的数据协同放置,仅在应用依赖关系要求时引入通信,同时重排多项式算子以将剩余聚合操作与计算重叠。在2048令牌输入下,相比现有最优设计,AEGIS在前馈网络中降低57.9%的GPU间通信量,在自注意力中降低81.3%。在四GPU系统上,它实现高达96.62%的扩展效率、3.86倍的端到端加速比以及69.1%的每设备内存节省。这些结果确立了协同应用-加密并行作为可扩展同态Transformer推理的实用基础。