We consider a deep matrix factorization model of covariance matrices trained with the Bures-Wasserstein distance. While recent works have made important advances in the study of the optimization problem for overparametrized low-rank matrix approximation, much emphasis has been placed on discriminative settings and the square loss. In contrast, our model considers another interesting type of loss and connects with the generative setting. We characterize the critical points and minimizers of the Bures-Wasserstein distance over the space of rank-bounded matrices. For low-rank matrices the Hessian of this loss can theoretically blow up, which creates challenges to analyze convergence of optimizaton methods. We establish convergence results for gradient flow using a smooth perturbative version of the loss and convergence results for finite step size gradient descent under certain assumptions on the initial weights.
翻译:本文研究了一种采用Bures-Wasserstein距离训练的协方差矩阵深度矩阵分解模型。尽管近期研究在过参数化低秩矩阵逼近的优化问题方面取得了重要进展,但多数工作聚焦于判别式设置与平方损失函数。与此不同,本模型引入另一种具有理论意义的损失函数,并与生成式框架建立关联。我们刻画了秩有界矩阵空间中Bures-Wasserstein距离的临界点与极小值点。当处理低秩矩阵时,该损失的Hessian矩阵可能产生理论上的奇异膨胀现象,这给优化方法的收敛性分析带来了挑战。我们通过引入平滑扰动版本的损失函数证明了梯度流的收敛性,并在初始权重满足特定假设条件下,建立了有限步长梯度下降的收敛性结论。