Stochastic bilevel optimization (SBO) has become a standard framework for hyperparameter learning, data reweighting, representation learning, and data-mixture optimization in deep learning. Existing exact single-loop SBO methods and memory-efficient surrogate SBO methods either create severe memory pressure for large lower-level neural networks or lack competitive convergence guarantees under standard assumptions. In this paper, we propose BROS, a memory-efficient single-loop SBO method with the same convergence rate order as exact single-loop SBO methods. BROS performs lower and auxiliary updates in randomized subspaces with a Rademacher bi-probe correction that recovers an unbiased Hessian-action estimator. We prove that BROS preserves the $\mathcal O(\varepsilon^{-2})$ sample complexity of MA-SOBA for finding an $\varepsilon$-stationary point under only standard assumptions. Experiments on hyper-data cleaning, data-mixture learning, hyper-representation learning, and ViT sample reweighting show that BROS reduces peak memory by up to 44.9% while closely matching full-space baseline performance.
翻译:随机双层优化已成为深度学习中超参数学习、数据重加权、表示学习和数据混合优化的标准框架。现有精确单层双层优化方法和内存高效代理双层优化方法,要么对大型下层神经网络产生严重的内存压力,要么在标准假设下缺乏有竞争力的收敛保证。本文提出BROS,一种内存高效的单层双层优化方法,其收敛速度阶数与精确单层双层优化方法相同。BROS在具有Rademacher双探针校正的随机子空间中执行下层和辅助更新,该校正恢复了无偏的Hessian作用估计量。我们证明,仅在标准假设下,BROS保持了MA-SOBA寻找ε-驻点所需的$\mathcal O(\varepsilon^{-2})$样本复杂度。在超数据清洗、数据混合学习、超表示学习和ViT样本重加权上的实验表明,BROS最多可减少44.9%的峰值内存,同时与全空间基线性能紧密匹配。