Secure multi-party computation (MPC) enables computation directly on encrypted data and protects both data and model privacy in deep learning inference. However, existing neural network architectures, including Vision Transformers (ViTs), are not designed or optimized for MPC and incur significant latency overhead. We observe Softmax accounts for the major latency bottleneck due to a high communication complexity, but can be selectively replaced or linearized without compromising the model accuracy. Hence, in this paper, we propose an MPC-friendly ViT, dubbed MPCViT, to enable accurate yet efficient ViT inference in MPC. Based on a systematic latency and accuracy evaluation of the Softmax attention and other attention variants, we propose a heterogeneous attention optimization space. We also develop a simple yet effective MPC-aware neural architecture search algorithm for fast Pareto optimization. To further boost the inference efficiency, we propose MPCViT+, to jointly optimize the Softmax attention and other network components, including GeLU, matrix multiplication, etc. With extensive experiments, we demonstrate that MPCViT achieves 1.9%, 1.3% and 4.6% higher accuracy with 6.2x, 2.9x and 1.9x latency reduction compared with baseline ViT, MPCFormer and THE-X on the Tiny-ImageNet dataset, respectively. MPCViT+ further achieves 1.2x latency reduction on CIFAR-100 dataset and reaches a better Pareto front compared with MPCViT.
翻译:安全多方计算(MPC)支持直接在加密数据上执行计算,并在深度学习推理中保护数据和模型隐私。然而,现有神经网络架构(包括视觉Transformer,ViT)并非针对MPC设计或优化,导致显著延迟开销。我们观察到Softmax因高通信复杂度成为主要延迟瓶颈,但可在不牺牲模型精度的前提下选择性替换或线性化。因此,本文提出一种MPC友好型ViT——MPCViT,以实现MPC中精准高效的ViT推理。基于对Softmax注意力及其他注意力变体的系统性延迟与精度评估,我们提出异构注意力优化空间,并开发了一种简单高效的MPC感知神经架构搜索算法以快速实现帕累托优化。为进一步提升推理效率,我们提出MPCViT+,联合优化Softmax注意力与其他网络组件(包括GeLU、矩阵乘法等)。通过大量实验证明,在Tiny-ImageNet数据集上,MPCViT相比基线ViT、MPCFormer和THE-X分别实现1.9%、1.3%和4.6%的精度提升,同时延迟降低6.2倍、2.9倍和1.9倍。MPCViT+在CIFAR-100数据集上进一步实现1.2倍延迟降低,并达到比MPCViT更优的帕累托前沿。