The minimax optimization over Riemannian manifolds (possibly nonconvex constraints) has been actively applied to solve many problems, such as robust dimensionality reduction and deep neural networks with orthogonal weights (Stiefel manifold). Although many optimization algorithms for minimax problems have been developed in the Euclidean setting, it is difficult to convert them into Riemannian cases, and algorithms for nonconvex minimax problems with nonconvex constraints are even rare. On the other hand, to address the big data challenges, decentralized (serverless) training techniques have recently been emerging since they can reduce communications overhead and avoid the bottleneck problem on the server node. Nonetheless, the algorithm for decentralized Riemannian minimax problems has not been studied. In this paper, we study the distributed nonconvex-strongly-concave minimax optimization problem over the Stiefel manifold and propose both deterministic and stochastic minimax methods. The local model is non-convex strong-concave and the Steifel manifold is a non-convex set. The global function is represented as the finite sum of local functions. For the deterministic setting, we propose DRGDA and prove that our deterministic method achieves a gradient complexity of $O( \epsilon^{-2})$ under mild conditions. For the stochastic setting, we propose DRSGDA and prove that our stochastic method achieves a gradient complexity of $O(\epsilon^{-4})$. The DRGDA and DRSGDA are the first algorithms for distributed minimax optimization with nonconvex constraints with exact convergence. Extensive experimental results on the Deep Neural Networks (DNNs) training over the Stiefel manifold demonstrate the efficiency of our algorithms.
翻译:黎曼流形上的极小极大优化(可能包含非凸约束)已广泛应用于解决诸多问题,例如鲁棒降维和具有正交权重(斯蒂弗尔流形)的深度神经网络。尽管欧几里得空间中已有许多极小极大问题的优化算法,但将其转换为黎曼情形较为困难,且针对非凸约束的非凸极小极大问题的算法更为罕见。另一方面,为应对大数据挑战,去中心化(无服务器)训练技术近年来逐渐兴起,因其能降低通信开销并避免服务器节点的瓶颈问题。然而,去中心化黎曼极小极大问题的算法尚未被研究。本文研究了斯蒂弗尔流形上的分布式非凸-强凹极小极大优化问题,并提出了确定性和随机极小极大方法。局部模型为非凸强凹函数,且斯蒂弗尔流形是凸集。全局函数表示为局部函数的有限和。在确定性设定下,我们提出DRGDA算法,并证明在温和条件下,该确定性方法的梯度复杂度为$O(\epsilon^{-2})$。在随机设定下,我们提出DRSGDA算法,并证明该随机方法的梯度复杂度为$O(\epsilon^{-4})$。DRGDA和DRSGDA是首个针对具有非凸约束的分布式极小极大优化问题实现精确收敛的算法。在斯蒂弗尔流形上的深度神经网络训练实验表明,我们的算法具有高效性。