Large number of ReLU and MAC operations of Deep neural networks make them ill-suited for latency and compute-efficient private inference. In this paper, we present a model optimization method that allows a model to learn to be shallow. In particular, we leverage the ReLU sensitivity of a convolutional block to remove a ReLU layer and merge its succeeding and preceding convolution layers to a shallow block. Unlike existing ReLU reduction methods, our joint reduction method can yield models with improved reduction of both ReLUs and linear operations by up to 1.73x and 1.47x, respectively, evaluated with ResNet18 on CIFAR-100 without any significant accuracy-drop.
翻译:深度神经网络中大量的ReLU和乘累加(MAC)操作使其不适用于延迟敏感且计算高效的隐私推理。本文提出了一种模型优化方法,使模型能够学习变浅。具体而言,我们利用卷积块的ReLU敏感性移除一个ReLU层,并将其后续和前置的卷积层合并为一个浅层块。与现有的ReLU缩减方法不同,我们的联合缩减方法能够生成在ReLU和线性操作上均获得更优缩减的模型,在CIFAR-100上使用ResNet18评估时,分别实现高达1.73倍和1.47倍的缩减,且未出现显著的精度下降。