We consider a deep matrix factorization model of covariance matrices trained with the Bures-Wasserstein distance. While recent works have made important advances in the study of the optimization problem for overparametrized low-rank matrix approximation, much emphasis has been placed on discriminative settings and the square loss. In contrast, our model considers another interesting type of loss and connects with the generative setting. We characterize the critical points and minimizers of the Bures-Wasserstein distance over the space of rank-bounded matrices. For low-rank matrices the Hessian of this loss can theoretically blow up, which creates challenges to analyze convergence of optimizaton methods. We establish convergence results for gradient flow using a smooth perturbative version of the loss and convergence results for finite step size gradient descent under certain assumptions on the initial weights.
翻译:本文研究了采用Bures-Wasserstein距离训练的协方差矩阵深度矩阵分解模型。尽管近期研究在过参数化低秩矩阵逼近的优化问题方面取得了重要进展,但多数工作聚焦于判别式设置与平方损失。相比之下,本文模型考虑了另一种有趣的损失类型,并与生成式设置建立联系。我们刻画了有界秩矩阵空间上Bures-Wasserstein距离的临界点与极小值点。对于低秩矩阵,该损失的Hessian矩阵理论上可能发散,这给优化方法的收敛性分析带来挑战。我们利用该损失的平滑微扰版本建立了梯度流的收敛性结果,并在初始权重的特定假设下证明了有限步长梯度下降的收敛性。