Secure multi-party computation (MPC) enables computation directly on encrypted data and protects both data and model privacy in deep learning inference. However, existing neural network architectures, including Vision Transformers (ViTs), are not designed or optimized for MPC and incur significant latency overhead. We observe Softmax accounts for the major latency bottleneck due to a high communication complexity, but can be selectively replaced or linearized without compromising the model accuracy. Hence, in this paper, we propose an MPC-friendly ViT, dubbed MPCViT, to enable accurate yet efficient ViT inference in MPC. Based on a systematic latency and accuracy evaluation of the Softmax attention and other attention variants, we propose a heterogeneous attention optimization space. We also develop a simple yet effective MPC-aware neural architecture search algorithm for fast Pareto optimization. To further boost the inference efficiency, we propose MPCViT+, to jointly optimize the Softmax attention and other network components, including GeLU, matrix multiplication, etc. With extensive experiments, we demonstrate that MPCViT achieves 1.9%, 1.3% and 3.6% higher accuracy with 6.2x, 2.9x and 1.9x latency reduction compared with baseline ViT, MPCFormer and THE-X on the Tiny-ImageNet dataset, respectively. MPCViT+ further achieves a better Pareto front compared with MPCViT. The code and models for evaluation are available at https://github.com/PKU-SEC-Lab/mpcvit.
翻译:安全多方计算(MPC)可在加密数据上直接进行计算,在深度学习推理中保护数据与模型隐私。然而,现有神经网络架构(包括视觉Transformer)并非为MPC设计或优化,会带来显著延迟开销。我们观察到Softmax因高通信复杂度成为主要延迟瓶颈,但可通过选择性替换或线性化处理在不影响模型精度的前提下加以改进。为此,本文提出名为MPCViT的MPC友好型视觉Transformer,在MPC场景下实现高精度且高效的ViT推理。基于对Softmax注意力及其变体在延迟与精度上的系统性评估,我们提出异构注意力优化空间,并开发了简洁高效的MPC感知神经架构搜索算法用于快速帕累托优化。为进一步提升推理效率,我们提出MPCViT+,对Softmax注意力、GeLU激活函数及矩阵乘法等网络组件进行联合优化。大量实验表明,在Tiny-ImageNet数据集上,相较于基线ViT、MPCFormer和THE-X,MPCViT分别实现1.9%、1.3%和3.6%的精度提升,同时延迟降低6.2倍、2.9倍和1.9倍。相较于MPCViT,MPCViT+进一步获得更优的帕累托前沿。评估代码与模型已开源至https://github.com/PKU-SEC-Lab/mpcvit。