To reduce the computational cost of convolutional neural networks (CNNs) for usage on resource-constrained devices, structured pruning approaches have shown promising results, drastically reducing floating-point operations (FLOPs) without substantial drops in accuracy. However, most recent methods require fine-tuning or specific training procedures to achieve a reasonable trade-off between retained accuracy and reduction in FLOPs. This introduces additional cost in the form of computational overhead and requires training data to be available. To this end, we propose HASTE (Hashing for Tractable Efficiency), a parameter-free and data-free module that acts as a plug-and-play replacement for any regular convolution module. It instantly reduces the network's test-time inference cost without requiring any training or fine-tuning. We are able to drastically compress latent feature maps without sacrificing much accuracy by using locality-sensitive hashing (LSH) to detect redundancies in the channel dimension. Similar channels are aggregated to reduce the input and filter depth simultaneously, allowing for cheaper convolutions. We demonstrate our approach on the popular vision benchmarks CIFAR-10 and ImageNet. In particular, we are able to instantly drop 46.72% of FLOPs while only losing 1.25% accuracy by just swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
翻译:为降低卷积神经网络(CNN)在资源受限设备上的计算成本,结构化剪枝方法展现出可观前景,可在不显著降低精度的情况下大幅减少浮点运算次数(FLOPs)。然而,现有多数方法需通过微调或特定训练流程才能实现精度保留与FLOPs削减间的合理权衡,这会引入额外计算开销,且要求训练数据可用。为此,我们提出HASTE(可追踪效率的哈希方法),这是一种无参数、无数据的模块,可作为任意常规卷积模块的即插即用替代方案。该模块无需任何训练或微调,即可即时降低网络测试时的推理成本。我们通过局部敏感哈希(LSH)检测通道维度的冗余性,从而在不显著牺牲精度的情况下大幅压缩潜在特征图。通过对相似通道进行聚合,可同步减少输入和滤波器深度,实现更经济的卷积运算。我们在CIFAR-10和ImageNet等主流视觉基准上验证了该方法。具体而言,将ResNet34在CIFAR-10上的卷积模块替换为HASTE模块后,可即时降低46.72%的FLOPs,而精度仅下降1.25%。