The growth of the Machine-Learning-As-A-Service (MLaaS) market has highlighted clients' data privacy and security issues. Private inference (PI) techniques using cryptographic primitives offer a solution but often have high computation and communication costs, particularly with non-linear operators like ReLU. Many attempts to reduce ReLU operations exist, but they may need heuristic threshold selection or cause substantial accuracy loss. This work introduces AutoReP, a gradient-based approach to lessen non-linear operators and alleviate these issues. It automates the selection of ReLU and polynomial functions to speed up PI applications and introduces distribution-aware polynomial approximation (DaPa) to maintain model expressivity while accurately approximating ReLUs. Our experimental results demonstrate significant accuracy improvements of 6.12% (94.31%, 12.9K ReLU budget, CIFAR-10), 8.39% (74.92%, 12.9K ReLU budget, CIFAR-100), and 9.45% (63.69%, 55K ReLU budget, Tiny-ImageNet) over current state-of-the-art methods, e.g., SNL. Morever, AutoReP is applied to EfficientNet-B2 on ImageNet dataset, and achieved 75.55% accuracy with 176.1 times ReLU budget reduction.
翻译:机器学习即服务(MLaaS)市场的增长凸显了客户数据隐私与安全问题。利用密码学原语的私有推理(PI)技术提供了解决方案,但往往存在高昂的计算和通信成本,尤其在处理ReLU等非线性算子时。现有减少ReLU操作的方法众多,但可能需要启发式阈值选择或导致显著精度损失。本文提出AutoReP,一种基于梯度的非线性算子缩减方法以缓解上述问题。该方法自动选择ReLU与多项式函数的组合以加速PI应用,并引入分布感知多项式近似(DaPa)来保持模型表达能力的同时精确逼近ReLU。实验结果表明,相较当前最先进方法(如SNL),该方法在CIFAR-10(12.9K ReLU预算,94.31%)、CIFAR-100(12.9K ReLU预算,74.92%)和Tiny-ImageNet(55K ReLU预算,63.69%)上分别获得6.12%、8.39%和9.45%的显著精度提升。此外,将AutoReP应用于ImageNet数据集上的EfficientNet-B2,在实现176.1倍ReLU预算缩减的同时达到75.55%的精度。