We analyse a general class of bilevel problems, in which the upper-level problem consists in the minimization of a smooth objective function and the lower-level problem is to find the fixed point of a smooth contraction map. This type of problems include instances of meta-learning, equilibrium models, hyperparameter optimization and data poisoning adversarial attacks. Several recent works have proposed algorithms which warm-start the lower-level problem, i.e. they use the previous lower-level approximate solution as a staring point for the lower-level solver. This warm-start procedure allows one to improve the sample complexity in both the stochastic and deterministic settings, achieving in some cases the order-wise optimal sample complexity. However, there are situations, e.g., meta learning and equilibrium models, in which the warm-start procedure is not well-suited or ineffective. In this work we show that without warm-start, it is still possible to achieve order-wise (near) optimal sample complexity. In particular, we propose a simple method which uses (stochastic) fixed point iterations at the lower-level and projected inexact gradient descent at the upper-level, that reaches an $\epsilon$-stationary point using $O(\epsilon^{-2})$ and $\tilde{O}(\epsilon^{-1})$ samples for the stochastic and the deterministic setting, respectively. Finally, compared to methods using warm-start, our approach yields a simpler analysis that does not need to study the coupled interactions between the upper-level and lower-level iterates.
翻译:我们分析了一类通用的双层优化问题,其中上层问题为光滑目标函数的最小化,下层问题为寻找光滑压缩映射的不动点。此类问题涵盖元学习、均衡模型、超参数优化及数据中毒对抗攻击等实例。近期多项研究提出采用热启动机制处理下层问题,即利用前次下层近似解作为下层求解器的初始点。该热启动方法在随机与确定性设定下均可提升样本复杂度,在某些场景中达到阶数最优的样本复杂度。然而在元学习与均衡模型等情形中,热启动方法并不适用或效果不佳。本文证明,即使不采用热启动,仍能实现阶数(近)最优的样本复杂度。具体而言,我们提出一种简洁方法:下层采用(随机)不动点迭代,上层采用投影非精确梯度下降,在随机与确定性设定下分别仅需$O(\epsilon^{-2})$与$\tilde{O}(\epsilon^{-1})$个样本即可达到$\epsilon$-驻点。相较于需热启动的方法,本方法无需分析上下层迭代的耦合交互,分析过程更为简洁。