In this paper, we consider non-convex multi-block bilevel optimization (MBBO) problems, which involve $m\gg 1$ lower level problems and have important applications in machine learning. Designing a stochastic gradient and controlling its variance is more intricate due to the hierarchical sampling of blocks and data and the unique challenge of estimating hyper-gradient. We aim to achieve three nice properties for our algorithm: (a) matching the state-of-the-art complexity of standard BO problems with a single block; (b) achieving parallel speedup by sampling $I$ blocks and sampling $B$ samples for each sampled block per-iteration; (c) avoiding the computation of the inverse of a high-dimensional Hessian matrix estimator. However, it is non-trivial to achieve all of these by observing that existing works only achieve one or two of these properties. To address the involved challenges for achieving (a, b, c), we propose two stochastic algorithms by using advanced blockwise variance-reduction techniques for tracking the Hessian matrices (for low-dimensional problems) or the Hessian-vector products (for high-dimensional problems), and prove an iteration complexity of $O(\frac{m\epsilon^{-3}\mathbb{I}(I<m)}{I\sqrt{I}} + \frac{m\epsilon^{-3}}{I\sqrt{B}})$ for finding an $\epsilon$-stationary point under appropriate conditions. We also conduct experiments to verify the effectiveness of the proposed algorithms comparing with existing MBBO algorithms.
翻译:本文研究非凸多块双层优化(MBBO)问题,这类问题涉及$m\gg 1$个下层问题,在机器学习中具有重要应用。由于块与数据的层次化采样以及估计超梯度的独特挑战,设计随机梯度并控制其方差更为复杂。我们旨在使算法具备以下三个优良特性:(a) 达到与单块标准BO问题相同的最优复杂度;(b) 通过每轮采样$I$个块且每个采样块抽取$B$个样本实现并行加速;(c) 避免计算高维Hessian矩阵估计量的逆。然而,现有工作仅能实现上述特性的一至两项,同时达成所有目标存在显著困难。为解决实现(a, b, c)所涉及的挑战,我们提出两种随机算法,采用先进的分段方差缩减技术追踪Hessian矩阵(适用于低维问题)或Hessian-向量乘积(适用于高维问题),并在适当条件下证明算法达到$\epsilon$-稳定点的迭代复杂度为$O(\frac{m\epsilon^{-3}\mathbb{I}(I<m)}{I\sqrt{I}} + \frac{m\epsilon^{-3}}{I\sqrt{B}})$。通过实验验证,所提算法相较于现有MBBO算法具有更优性能。