We consider stochastic unconstrained bilevel optimization problems when only the first-order gradient oracles are available. While numerous optimization methods have been proposed for tackling bilevel problems, existing methods either tend to require possibly expensive calculations regarding Hessians of lower-level objectives, or lack rigorous finite-time performance guarantees. In this work, we propose a Fully First-order Stochastic Approximation (F2SA) method, and study its non-asymptotic convergence properties. Specifically, we show that F2SA converges to an $\epsilon$-stationary solution of the bilevel problem after $\epsilon^{-7/2}, \epsilon^{-5/2}$, and $\epsilon^{-3/2}$ iterations (each iteration using $O(1)$ samples) when stochastic noises are in both level objectives, only in the upper-level objective, and not present (deterministic settings), respectively. We further show that if we employ momentum-assisted gradient estimators, the iteration complexities can be improved to $\epsilon^{-5/2}, \epsilon^{-4/2}$, and $\epsilon^{-3/2}$, respectively. We demonstrate even superior practical performance of the proposed method over existing second-order based approaches on MNIST data-hypercleaning experiments.
翻译:我们考虑在仅能使用一阶梯度的情形下的随机无约束双层优化问题。尽管已有多种优化方法被提出用于解决双层问题,但现有方法要么需要计算下层目标函数的黑塞矩阵(这可能导致昂贵的计算代价),要么缺乏严格的有限时间性能保证。本文提出了一种全一阶随机逼近(F2SA)方法,并研究其非渐近收敛性质。具体而言,我们证明:当随机噪声同时存在于两层目标函数、仅存在于上层目标函数、以及不存在(确定性设定)时,F2SA方法分别经过$\epsilon^{-7/2}$、$\epsilon^{-5/2}$和$\epsilon^{-3/2}$次迭代(每次迭代使用$O(1)$个样本)后收敛到双层问题的$\epsilon$-稳定解。进一步研究表明,若采用动量辅助梯度估计器,迭代复杂度可分别改进至$\epsilon^{-5/2}$、$\epsilon^{-4/2}$和$\epsilon^{-3/2}$。在MNIST数据超清洁实验中,我们证明了所提方法相比现有基于二阶方法具有更优越的实际性能。