Private convolutional neural network (CNN) inference based on secure two-party computation (2PC) suffers from high communication and latency overhead, especially from convolution layers. In this paper, we propose EQO, a quantized 2PC inference framework that jointly optimizes the CNNs and 2PC protocols. EQO features a novel 2PC protocol that combines Winograd transformation with quantization for efficient convolution computation. However, we observe naively combining quantization and Winograd convolution is sub-optimal: Winograd transformations introduce extensive local additions and weight outliers that increase the quantization bit widths and require frequent bit width conversions with non-negligible communication overhead. Therefore, at the protocol level, we propose a series of optimizations for the 2PC inference graph to minimize the communication. At the network level, We develop a sensitivity-based mixed-precision quantization algorithm to optimize network accuracy given communication constraints. We further propose a 2PC-friendly bit re-weighting algorithm to accommodate weight outliers without increasing bit widths. With extensive experiments, EQO demonstrates 11.7x, 3.6x, and 6.3x communication reduction with 1.29%, 1.16%, and 1.29% higher accuracy compared to state-of-the-art frameworks SiRNN, COINN, and CoPriv, respectively.
翻译:基于安全两方计算(2PC)的私有卷积神经网络(CNN)推理面临高通信和高延迟开销,尤其集中在卷积层。本文提出EQO——一种量化2PC推理框架,通过联合优化CNN与2PC协议实现高效推理。EQO的核心创新在于融合Winograd变换与量化的新型2PC协议,用于高效卷积计算。然而,我们观察到直接组合量化与Winograd卷积并非最优:Winograd变换会引入大量局部加法操作和权重离群值,导致量化位宽增加,并需要频繁进行位宽转换,产生不可忽略的通信开销。为此,在协议层面,我们提出面向2PC推理图的系列优化策略以最小化通信量;在网络层面,我们开发基于敏感度的混合精度量化算法,在给定通信约束下优化网络精度,并进一步提出2PC友好的位权重再分配算法,在不增加位宽的条件下处理权重离群值。大量实验表明,与现有最优框架SiRNN、COINN和CoPriv相比,EQO在实现11.7倍、3.6倍和6.3倍通信量降低的同时,分别获得1.29%、1.16%和1.29%的精度提升。