Model sparsification in deep learning promotes simpler, more interpretable models with fewer parameters. This not only reduces the model's memory footprint and computational needs but also shortens inference time. This work focuses on creating sparse models optimized for multiple tasks with fewer parameters. These parsimonious models also possess the potential to match or outperform dense models in terms of performance. In this work, we introduce channel-wise l1/l2 group sparsity in the shared convolutional layers parameters (or weights) of the multi-task learning model. This approach facilitates the removal of extraneous groups i.e., channels (due to l1 regularization) and also imposes a penalty on the weights, further enhancing the learning efficiency for all tasks (due to l2 regularization). We analyzed the results of group sparsity in both single-task and multi-task settings on two widely-used Multi-Task Learning (MTL) datasets: NYU-v2 and CelebAMask-HQ. On both datasets, which consist of three different computer vision tasks each, multi-task models with approximately 70% sparsity outperform their dense equivalents. We also investigate how changing the degree of sparsification influences the model's performance, the overall sparsity percentage, the patterns of sparsity, and the inference time.
翻译:深度学习中的模型稀疏化能够促进更简单、更具可解释性的模型,同时减少参数量。这不仅降低了模型的存储占用和计算需求,还缩短了推理时间。本研究聚焦于创建针对多任务优化且参数更少的稀疏模型。这些精简模型在性能上有望达到甚至超越稠密模型。本文在多任务学习模型的共享卷积层参数(或权重)中引入了通道级l1/l2组稀疏性。该方法一方面通过l1正则化移除无关组(即通道),另一方面通过l2正则化对权重施加惩罚,进一步提升了所有任务的学习效率。我们在两个广泛使用的多任务学习(MTL)数据集——NYU-v2和CelebAMask-HQ上,分析了单任务与多任务场景下的组稀疏性结果。在这两个各包含三项不同计算机视觉任务的数据集中,稀疏度约70%的多任务模型性能均优于其对应的稠密模型。我们还研究了稀疏化程度的变化对模型性能、整体稀疏度百分比、稀疏模式以及推理时间的影响。