The rapid growth and deployment of deep learning (DL) has witnessed emerging privacy and security concerns. To mitigate these issues, secure multi-party computation (MPC) has been discussed, to enable the privacy-preserving DL computation. In practice, they often come at very high computation and communication overhead, and potentially prohibit their popularity in large scale systems. Two orthogonal research trends have attracted enormous interests in addressing the energy efficiency in secure deep learning, i.e., overhead reduction of MPC comparison protocol, and hardware acceleration. However, they either achieve a low reduction ratio and suffer from high latency due to limited computation and communication saving, or are power-hungry as existing works mainly focus on general computing platforms such as CPUs and GPUs. In this work, as the first attempt, we develop a systematic framework, PolyMPCNet, of joint overhead reduction of MPC comparison protocol and hardware acceleration, by integrating hardware latency of the cryptographic building block into the DNN loss function to achieve high energy efficiency, accuracy, and security guarantee. Instead of heuristically checking the model sensitivity after a DNN is well-trained (through deleting or dropping some non-polynomial operators), our key design principle is to em enforce exactly what is assumed in the DNN design -- training a DNN that is both hardware efficient and secure, while escaping the local minima and saddle points and maintaining high accuracy. More specifically, we propose a straight through polynomial activation initialization method for cryptographic hardware friendly trainable polynomial activation function to replace the expensive 2P-ReLU operator. We develop a cryptographic hardware scheduler and the corresponding performance model for Field Programmable Gate Arrays (FPGA) platform.
翻译:深度学习的快速发展和部署引发了日益严峻的隐私与安全问题。为缓解这些问题,安全多方计算(MPC)被提出用于实现隐私保护的深度学习计算。然而在实际应用中,MPC协议往往带来极高的计算和通信开销,可能阻碍其在大规模系统中的普及。针对安全深度学习中的能效问题,两个正交研究方向引起了广泛关注:MPC比较协议的冗余缩减与硬件加速。然而,现有方案若非因计算和通信节省有限而难以实现大幅降比且延迟较高,便是因主要面向CPU和GPU等通用计算平台而具有高功耗需求。作为首次尝试,本文提出系统性框架PolyMPCNet,通过将密码学构建模块的硬件延迟集成至深度神经网络(DNN)损失函数,实现MPC比较协议冗余缩减与硬件加速的联合优化,从而在保证高能效、高精度和高安全性的同时,打破DNN设计中的隐含假设——即训练同时满足硬件高效与安全性的DNN,同时规避局部极小值与鞍点并保持高精度。具体而言,我们提出直通式多项式激活初始化方法,为密码学硬件友好的可训练多项式激活函数提供支持,以替代高开销的2P-ReLU算子。此外,针对现场可编程门阵列(FPGA)平台,我们开发了密码学硬件调度器及其相应性能模型。