Overparameterized models have proven to be powerful tools for solving various machine learning tasks. However, overparameterization often leads to a substantial increase in computational and memory costs, which in turn requires extensive resources to train. In this work, we aim to reduce this complexity by studying the learning dynamics of overparameterized deep networks. By extensively studying its learning dynamics, we unveil that the weight matrices of various architectures exhibit a low-dimensional structure. This finding implies that we can compress the networks by reducing the training to a small subspace. We take a step in developing a principled approach for compressing deep networks by studying deep linear models. We demonstrate that the principal components of deep linear models are fitted incrementally but within a small subspace, and use these insights to compress deep linear networks by decreasing the width of its intermediate layers. Remarkably, we observe that with a particular choice of initialization, the compressed network converges faster than the original network, consistently yielding smaller recovery errors throughout all iterations of gradient descent. We substantiate this observation by developing a theory focused on the deep matrix factorization problem, and by conducting empirical evaluations on deep matrix sensing. Finally, we demonstrate how our compressed model can enhance the utility of deep nonlinear models. Overall, we observe that our compression technique accelerates the training process by more than 2x, without compromising model quality.
翻译:过参数化模型已被证明是解决各种机器学习任务的有力工具。然而,过参数化常常导致计算和内存成本显著增加,进而需要大量资源进行训练。本研究旨在通过深入探究过参数化深度网络的学习动力学来降低这一复杂度。通过广泛研究其学习动力学,我们发现不同架构的权重矩阵呈现出低维结构。这一发现表明,我们可以通过将训练过程限制在小子空间中来压缩网络。我们通过研究深度线性模型,迈出了开发深度网络压缩原则性方法的一步。我们证明深度线性模型的主成分是在小子空间内逐步拟合的,并利用这一见解通过减小中间层宽度来压缩深度线性网络。值得注意的是,我们观察到,在特定初始化选择下,压缩网络比原始网络收敛更快,并在梯度下降的所有迭代中持续产生更小的恢复误差。我们通过聚焦于深度矩阵分解问题的理论分析以及在深度矩阵感知任务上的实证评估来验证这一观察。最后,我们展示了压缩模型如何提升深度非线性模型的效用。总体而言,我们观察到所提出的压缩技术能够在不影响模型质量的前提下,将训练过程加速超过两倍。