Advanced tensor decomposition, such as tensor train (TT), has been widely studied for tensor decomposition-based neural network (NN) training, which is one of the most common model compression methods. However, training NN with tensor decomposition always suffers significant accuracy loss and convergence issues. In this paper, a holistic framework is proposed for tensor decomposition-based NN training by formulating TT decomposition-based NN training as a nonconvex optimization problem. This problem can be solved by the proposed tensor block coordinate descent (tenBCD) method, which is a gradient-free algorithm. The global convergence of tenBCD to a critical point at a rate of O(1/k) is established with the Kurdyka {\L}ojasiewicz (K{\L}) property, where k is the number of iterations. The theoretical results can be extended to the popular residual neural networks (ResNets). The effectiveness and efficiency of our proposed framework are verified through an image classification dataset, where our proposed method can converge efficiently in training and prevent overfitting.
翻译:先进的张量分解方法,如张量列(TT),已被广泛研究用于基于张量分解的神经网络(NN)训练,这是最常见的模型压缩方法之一。然而,使用张量分解训练神经网络总是面临显著的精度损失和收敛问题。本文提出了一个基于张量分解的神经网络训练的全局框架,通过将基于TT分解的神经网络训练形式化为非凸优化问题。该问题可通过所提出的张量块坐标下降(tenBCD)方法求解,这是一种无梯度算法。利用Kurdyka Łojasiewicz(KŁ)性质,建立了tenBCD算法以O(1/k)的速率全局收敛到临界点,其中k为迭代次数。该理论结果可推广至流行的残差神经网络(ResNets)。通过图像分类数据集的实验验证了所提框架的有效性和高效性,其中我们的方法能够在训练中高效收敛并防止过拟合。