The minimax optimization over Riemannian manifolds (possibly nonconvex constraints) has been actively applied to solve many problems, such as robust dimensionality reduction and deep neural networks with orthogonal weights (Stiefel manifold). Although many optimization algorithms for minimax problems have been developed in the Euclidean setting, it is difficult to convert them into Riemannian cases, and algorithms for nonconvex minimax problems with nonconvex constraints are even rare. On the other hand, to address the big data challenges, decentralized (serverless) training techniques have recently been emerging since they can reduce communications overhead and avoid the bottleneck problem on the server node. Nonetheless, the algorithm for decentralized Riemannian minimax problems has not been studied. In this paper, we study the distributed nonconvex-strongly-concave minimax optimization problem over the Stiefel manifold and propose both deterministic and stochastic minimax methods. The Steifel manifold is a non-convex set. The global function is represented as the finite sum of local functions. For the deterministic setting, we propose DRGDA and prove that our deterministic method achieves a gradient complexity of $O( \epsilon^{-2})$ under mild conditions. For the stochastic setting, we propose DRSGDA and prove that our stochastic method achieves a gradient complexity of $O(\epsilon^{-4})$. The DRGDA and DRSGDA are the first algorithms for distributed minimax optimization with nonconvex constraints with exact convergence. Extensive experimental results on the Deep Neural Networks (DNNs) training over the Stiefel manifold demonstrate the efficiency of our algorithms.
翻译:黎曼流形(可能包含非凸约束)上的极小极大优化已被广泛应用于解决许多问题,例如鲁棒降维和具有正交权重的深度神经网络(Stiefel流形)。尽管在欧几里得设置中已开发出许多用于极小极大问题的优化算法,但将其转化为黎曼情形较为困难,且针对非凸约束的非凸极小极大问题的算法甚至更为罕见。另一方面,为应对大数据挑战,去中心化(无服务器)训练技术近年来逐渐兴起,因为它能够减少通信开销并避免服务器节点上的瓶颈问题。然而,针对去中心化黎曼极小极大问题的算法尚未得到研究。本文研究了Stiefel流形上的分布式非凸-强凹极小极大优化问题,并提出了确定性和随机性极小极大方法。Stiefel流形是一个非凸集。全局函数表示为局部函数的有限和。对于确定性设置,我们提出了DRGDA,并证明在温和条件下,我们的确定性方法达到了$O( \epsilon^{-2})$的梯度复杂度。对于随机性设置,我们提出了DRSGDA,并证明我们的随机方法达到了$O(\epsilon^{-4})$的梯度复杂度。DRGDA和DRSGDA是首个具有精确收敛性的非凸约束分布式极小极大优化算法。在Stiefel流形上进行的深度神经网络(DNNs)训练的广泛实验结果展示了我们算法的有效性。