Bilevel optimization problems, which are problems where two optimization problems are nested, have more and more applications in machine learning. In many practical cases, the upper and the lower objectives correspond to empirical risk minimization problems and therefore have a sum structure. In this context, we propose a bilevel extension of the celebrated SARAH algorithm. We demonstrate that the algorithm requires $\mathcal{O}((n+m)^{\frac12}\varepsilon^{-1})$ oracle calls to achieve $\varepsilon$-stationarity with $n+m$ the total number of samples, which improves over all previous bilevel algorithms. Moreover, we provide a lower bound on the number of oracle calls required to get an approximate stationary point of the objective function of the bilevel problem. This lower bound is attained by our algorithm, making it optimal in terms of sample complexity.
翻译:双层优化问题(即两个优化问题嵌套的问题)在机器学习中的应用日益广泛。在许多实际场景中,上层与下层目标函数均对应于经验风险最小化问题,因此具有求和结构。在此背景下,我们提出了经典SARAH算法的双层扩展版本。我们证明,该算法需要$\mathcal{O}((n+m)^{\frac12}\varepsilon^{-1})$次梯度计算即可达到$\varepsilon$-稳定点,其中$n+m$为总样本数,这一复杂度优于所有现有双层算法。此外,我们给出了获得双层问题目标函数近似稳定点所需梯度计算次数的下界。该下界被我们提出的算法所达到,从而使其在样本复杂度意义上达到最优。