(Stochastic) bilevel optimization is a frequently encountered problem in machine learning with a wide range of applications such as meta-learning, hyper-parameter optimization, and reinforcement learning. Most of the existing studies on this problem only focused on analyzing the convergence or improving the convergence rate, while little effort has been devoted to understanding its generalization behaviors. In this paper, we conduct a thorough analysis on the generalization of first-order (gradient-based) methods for the bilevel optimization problem. We first establish a fundamental connection between algorithmic stability and generalization error in different forms and give a high probability generalization bound which improves the previous best one from $\bigO(\sqrt{n})$ to $\bigO(\log n)$, where $n$ is the sample size. We then provide the first stability bounds for the general case where both inner and outer level parameters are subject to continuous update, while existing work allows only the outer level parameter to be updated. Our analysis can be applied in various standard settings such as strongly-convex-strongly-convex (SC-SC), convex-convex (C-C), and nonconvex-nonconvex (NC-NC). Our analysis for the NC-NC setting can also be extended to a particular nonconvex-strongly-convex (NC-SC) setting that is commonly encountered in practice. Finally, we corroborate our theoretical analysis and demonstrate how iterations can affect the generalization error by experiments on meta-learning and hyper-parameter optimization.
翻译:(随机)双层优化是机器学习中常见的问题,广泛应用于元学习、超参数优化和强化学习等领域。现有研究大多聚焦于分析该问题的收敛性或提升收敛速度,而对其泛化行为的理解却鲜有涉及。本文对基于一阶(梯度)方法的双层优化问题的泛化性进行了深入分析。首先,我们建立了算法稳定性与不同形式泛化误差之间的基本联系,并给出了一个高概率泛化界,该界将先前最优结果从 $\bigO(\sqrt{n})$ 改进至 $\bigO(\log n)$,其中 $n$ 为样本量。随后,我们首次针对内层和外层参数均连续更新的通用情形给出了稳定性界,而现有工作仅允许外层参数更新。我们的分析可应用于多种标准设置,例如强凸-强凸(SC-SC)、凸-凸(C-C)以及非凸-非凸(NC-NC)情形。对于 NC-NC 设置的分析还可扩展至实践中常见的特定非凸-强凸(NC-SC)设置。最后,我们通过元学习和超参数优化实验验证了理论分析,并展示了迭代次数如何影响泛化误差。