Despite their high accuracy, complex neural networks demand significant computational resources, posing challenges for deployment on resource-constrained devices such as mobile phones and embedded systems. Compression algorithms have been developed to address these challenges by reducing model size and computational demands while maintaining accuracy. Among these approaches, factorization methods based on tensor decomposition are theoretically sound and effective. However, they face difficulties in selecting the appropriate rank for decomposition. This paper tackles this issue by presenting a unified framework that simultaneously applies decomposition and optimal rank selection, employing a composite compression loss within defined rank constraints. Our approach includes an automatic rank search in a continuous space, efficiently identifying optimal rank configurations without the use of training data, making it computationally efficient. Combined with a subsequent fine-tuning step, our approach maintains the performance of highly compressed models on par with their original counterparts. Using various benchmark datasets, we demonstrate the efficacy of our method through a comprehensive analysis.
翻译:尽管复杂神经网络具有高精度,但其对计算资源的需求巨大,这给在移动电话和嵌入式系统等资源受限设备上的部署带来了挑战。压缩算法通过减小模型规模与计算需求同时保持精度,已被开发用于应对这些挑战。在这些方法中,基于张量分解的因子化方法在理论上是严谨且有效的。然而,它们在选择合适的分解秩方面面临困难。本文通过提出一个统一框架来解决此问题,该框架在定义的秩约束内,结合使用复合压缩损失,同时实施分解与最优秩选择。我们的方法包含在连续空间中进行自动秩搜索,无需使用训练数据即可高效识别最优秩配置,从而具有计算高效性。结合后续的微调步骤,我们的方法使高度压缩模型的性能保持与原始模型相当。通过使用多种基准数据集,我们通过综合分析证明了本方法的有效性。