To reduce the computational cost of convolutional neural networks (CNNs) on resource-constrained devices, structured pruning approaches have shown promise in lowering floating-point operations (FLOPs) without substantial drops in accuracy. However, most methods require fine-tuning or specific training procedures to achieve a reasonable trade-off between retained accuracy and reduction in FLOPs, adding computational overhead and requiring training data to be available. To this end, we propose HASTE (Hashing for Tractable Efficiency), a data-free, plug-and-play convolution module that instantly reduces a network's test-time inference cost without training or fine-tuning. Our approach utilizes locality-sensitive hashing (LSH) to detect redundancies in the channel dimension of latent feature maps, compressing similar channels to reduce input and filter depth simultaneously, resulting in cheaper convolutions. We demonstrate our approach on the popular vision benchmarks CIFAR-10 and ImageNet, where we achieve a 46.72% reduction in FLOPs with only a 1.25% loss in accuracy by swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
翻译:为降低卷积神经网络(CNN)在资源受限设备上的计算成本,结构化剪枝方法已展现出在不显著降低准确率的前提下减少浮点运算(FLOPs)的潜力。然而,大多数方法需要微调或特定的训练流程,以在保留准确率与减少FLOPs之间达成合理权衡,这不仅增加了计算开销,还要求训练数据可用。为此,我们提出HASTE(面向可处理效率的哈希),一种无需数据、即插即用的卷积模块,可在无需训练或微调的情况下即时降低网络在测试时的推理成本。我们的方法利用局部敏感哈希(LSH)检测潜在特征图通道维度中的冗余,通过压缩相似通道同时减少输入与滤波器的深度,从而实现更经济的卷积运算。我们在主流视觉基准数据集CIFAR-10和ImageNet上验证了该方法,其中在CIFAR-10数据集上将ResNet34中的卷积模块替换为我们的HASTE模块后,实现了FLOPs降低46.72%而准确率仅损失1.25%的效果。