HALOC: Hardware-Aware Automatic Low-Rank Compression for Compact Neural Networks

Low-rank compression is an important model compression strategy for obtaining compact neural network models. In general, because the rank values directly determine the model complexity and model accuracy, proper selection of layer-wise rank is very critical and desired. To date, though many low-rank compression approaches, either selecting the ranks in a manual or automatic way, have been proposed, they suffer from costly manual trials or unsatisfied compression performance. In addition, all of the existing works are not designed in a hardware-aware way, limiting the practical performance of the compressed models on real-world hardware platforms. To address these challenges, in this paper we propose HALOC, a hardware-aware automatic low-rank compression framework. By interpreting automatic rank selection from an architecture search perspective, we develop an end-to-end solution to determine the suitable layer-wise ranks in a differentiable and hardware-aware way. We further propose design principles and mitigation strategy to efficiently explore the rank space and reduce the potential interference problem. Experimental results on different datasets and hardware platforms demonstrate the effectiveness of our proposed approach. On CIFAR-10 dataset, HALOC enables 0.07% and 0.38% accuracy increase over the uncompressed ResNet-20 and VGG-16 models with 72.20% and 86.44% fewer FLOPs, respectively. On ImageNet dataset, HALOC achieves 0.9% higher top-1 accuracy than the original ResNet-18 model with 66.16% fewer FLOPs. HALOC also shows 0.66% higher top-1 accuracy increase than the state-of-the-art automatic low-rank compression solution with fewer computational and memory costs. In addition, HALOC demonstrates the practical speedups on different hardware platforms, verified by the measurement results on desktop GPU, embedded GPU and ASIC accelerator.

翻译：低秩压缩是获取紧凑型神经网络模型的重要模型压缩策略。通常，由于秩值直接决定模型复杂度与模型精度，合理选择逐层秩至关重要。迄今为止，尽管已有许多低秩压缩方法（包括手动或自动选择秩），但它们受限于昂贵的手动调优或欠佳的压缩性能。此外，现有方法均未采用硬件感知设计，这限制了压缩模型在实际硬件平台上的部署性能。为解决上述挑战，本文提出硬件感知自动低秩压缩框架HALOC。通过从架构搜索角度重新诠释自动秩选择问题，我们开发了一种端到端解决方案，以可微分且硬件感知的方式确定合适的逐层秩。我们进一步提出设计原则与缓解策略，以高效探索秩空间并减少潜在的干扰问题。在不同数据集与硬件平台上的实验结果表明了所提方法的有效性。在CIFAR-10数据集上，HALOC在ResNet-20和VGG-16未压缩模型基础上分别实现0.07%和0.38%的精度提升，同时减少72.20%和86.44%的FLOPs。在ImageNet数据集上，相比原始ResNet-18模型，HALOC在减少66.16% FLOPs的同时，Top-1精度提升0.9%。与最先进的自动低秩压缩方案相比，HALOC在更低计算与内存开销下实现0.66%的Top-1精度提升。此外，HALOC在桌面GPU、嵌入式GPU及ASIC加速器上的测量结果验证了其在不同硬件平台上的实际加速效果。