In this paper, we consider non-convex multi-block bilevel optimization (MBBO) problems, which involve $m\gg 1$ lower level problems and have important applications in machine learning. Designing a stochastic gradient and controlling its variance is more intricate due to the hierarchical sampling of blocks and data and the unique challenge of estimating hyper-gradient. We aim to achieve three nice properties for our algorithm: (a) matching the state-of-the-art complexity of standard BO problems with a single block; (b) achieving parallel speedup by sampling $I$ blocks and sampling $B$ samples for each sampled block per-iteration; (c) avoiding the computation of the inverse of a high-dimensional Hessian matrix estimator. However, it is non-trivial to achieve all of these by observing that existing works only achieve one or two of these properties. To address the involved challenges for achieving (a, b, c), we propose two stochastic algorithms by using advanced blockwise variance-reduction techniques for tracking the Hessian matrices (for low-dimensional problems) or the Hessian-vector products (for high-dimensional problems), and prove an iteration complexity of $O(\frac{m\epsilon^{-3}\mathbb{I}(I<m)}{I\sqrt{I}} + \frac{m\epsilon^{-3}}{I\sqrt{B}})$ for finding an $\epsilon$-stationary point under appropriate conditions. We also conduct experiments to verify the effectiveness of the proposed algorithms comparing with existing MBBO algorithms.
翻译:摘要:本文研究非凸多模块双层优化(MBBO)问题,该问题包含 $m\gg 1$ 个下层问题,并在机器学习领域具有重要应用。由于模块与数据的层次化采样以及超梯度估计的特殊挑战,设计随机梯度并控制其方差更为复杂。我们的目标是为算法实现三个优良特性:(a)达到标准单模块双层优化问题的最优复杂度;(b)通过每轮采样 $I$ 个模块、每个采样模块抽取 $B$ 个样本实现并行加速;(c)避免计算高维海森矩阵估计量的逆。然而,现有工作仅能实现其中一或两个特性,同时实现所有目标具有显著难度。针对达成(a, b, c)所面临的挑战,我们提出两种随机算法:通过采用先进的分块方差缩减技术追踪海森矩阵(适用于低维问题)或海森-向量乘积(适用于高维问题),并证明了在适当条件下寻找 $\epsilon$-稳定点的迭代复杂度为 $O(\frac{m\epsilon^{-3}\mathbb{I}(I<m)}{I\sqrt{I}} + \frac{m\epsilon^{-3}}{I\sqrt{B}})$。我们通过实验验证了所提算法相较于现有MBBO算法的有效性。