Compositional minimax optimization is a pivotal yet under-explored challenge across machine learning, including distributionally robust training and policy evaluation for reinforcement learning. Current techniques exhibit suboptimal complexity or rely heavily on large batch sizes. This paper proposes Nested STOchastic Recursive Momentum (NSTORM), attaining the optimal sample complexity of $O(\kappa^3/\epsilon^3)$ for finding an $\epsilon$-accurate solution. However, NSTORM requires low learning rates, potentially limiting applicability. Thus we introduce ADAptive NSTORM (ADA-NSTORM) with adaptive learning rates, proving it achieves the same sample complexity while experiments demonstrate greater effectiveness. Our methods match lower bounds for minimax optimization without large batch requirements, validated through extensive experiments. This work significantly advances compositional minimax optimization, a crucial capability for distributional robustness and policy evaluation
翻译:组合极小极大优化是机器学习中一个关键但尚未充分探索的挑战,涵盖分布鲁棒训练和强化学习中的策略评估。现有技术要么复杂度不优,要么严重依赖大批次大小。本文提出嵌套随机递归动量(NSTORM),实现了寻找$\epsilon$精确解的最优样本复杂度$O(\kappa^3/\epsilon^3)$。然而,NSTORM需要较低的学习率,可能限制其适用性。因此,我们引入具有自适应学习率的自适应NSTORM(ADA-NSTORM),证明其能达到相同样本复杂度,而实验表明其效果更优。我们的方法在不要求大批次的情况下匹配了极小极大优化的下界,并通过大量实验验证。此项工作显著推进了组合极小极大优化,这是实现分布鲁棒性和策略评估的关键能力。