Centaur: Bridging the Impossible Trinity of Privacy, Efficiency, and Performance in Privacy-Preserving Transformer Inference

As pre-trained models, like Transformers, are increasingly deployed on cloud platforms for inference services, the privacy concerns surrounding model parameters and inference data are becoming more acute. Current Privacy-Preserving Transformer Inference (PPTI) frameworks struggle with the "impossible trinity" of privacy, efficiency, and performance. For instance, Secure Multi-Party Computation (SMPC)-based solutions offer strong privacy guarantees but come with significant inference overhead and performance trade-offs. On the other hand, PPTI frameworks that use random permutations achieve inference efficiency close to that of plaintext and maintain accurate results but require exposing some model parameters and intermediate results, thereby risking substantial privacy breaches. Addressing this "impossible trinity" with a single technique proves challenging. To overcome this challenge, we propose Centaur, a novel hybrid PPTI framework. Unlike existing methods, Centaur protects model parameters with random permutations and inference data with SMPC, leveraging the structure of Transformer models. By designing a series of efficient privacy-preserving algorithms, Centaur leverages the strengths of both techniques to achieve a better balance between privacy, efficiency, and performance in PPTI. We comprehensively evaluate the effectiveness of Centaur on various types of Transformer models and datasets. Experimental results demonstrate that the privacy protection capabilities offered by Centaur can withstand various existing model inversion attack methods. In terms of performance and efficiency, Centaur not only maintains the same performance as plaintext inference but also improves inference speed by $5.0-30.4$ times.

翻译：随着Transformer等预训练模型日益部署在云平台上提供推理服务，模型参数与推理数据相关的隐私问题变得愈发严峻。当前的隐私保护Transformer推理框架在隐私、效率与性能的"不可能三角"中面临困境。例如，基于安全多方计算的解决方案虽能提供强大的隐私保障，却伴随着显著的推理开销与性能折衷。另一方面，采用随机置换的PPTI框架虽能实现接近明文推理的推理效率并保持结果准确，但需要暴露部分模型参数与中间结果，从而存在重大隐私泄露风险。单一技术难以应对这一"不可能三角"挑战。为此，我们提出Centaur——一种新颖的混合PPTI框架。与现有方法不同，Centaur利用Transformer模型的结构特性，采用随机置换保护模型参数，同时使用SMPC保护推理数据。通过设计一系列高效的隐私保护算法，Centaur融合两种技术的优势，在PPTI中实现了隐私、效率与性能间更优的平衡。我们在多种Transformer模型与数据集上全面评估Centaur的有效性。实验结果表明，Centaur提供的隐私保护能力能够抵御各类现有模型逆向攻击方法。在性能与效率方面，Centaur不仅保持与明文推理相同的性能表现，还将推理速度提升了$5.0-30.4$倍。