We develop and analyze stochastic approximation algorithms for solving nested compositional bi-level optimization problems. These problems involve a nested composition of $T$ potentially non-convex smooth functions in the upper-level, and a smooth and strongly convex function in the lower-level. Our proposed algorithm does not rely on matrix inversions or mini-batches and can achieve an $\epsilon$-stationary solution with an oracle complexity of approximately $\tilde{O}_T(1/\epsilon^{2})$, assuming the availability of stochastic first-order oracles for the individual functions in the composition and the lower-level, which are unbiased and have bounded moments. Here, $\tilde{O}_T$ hides polylog factors and constants that depend on $T$. The key challenge we address in establishing this result relates to handling three distinct sources of bias in the stochastic gradients. The first source arises from the compositional nature of the upper-level, the second stems from the bi-level structure, and the third emerges due to the utilization of Neumann series approximations to avoid matrix inversion. To demonstrate the effectiveness of our approach, we apply it to the problem of robust feature learning for deep neural networks under covariate shift, showcasing the benefits and advantages of our methodology in that context.
翻译:我们开发并分析了求解嵌套组合双层级优化问题的随机逼近算法。这些问题在上层涉及$T$个可能非凸光滑函数的嵌套组合,在下层涉及一个光滑且强凸的函数。我们提出的算法不依赖矩阵求逆或小批量处理,且能在假设组合中各函数及下层提供具有有界矩的无偏随机一阶预言条件下,以约$\tilde{O}_T(1/\epsilon^{2})$的预言复杂度达到$\epsilon$-稳定解。此处$\tilde{O}_T$隐藏了对数因子及依赖于$T$的常数。在建立该结果时,我们面临的关键挑战在于处理随机梯度中三种不同来源的偏差:第一类源于上层的组合性质,第二类源于双层级结构,第三类则因采用诺伊曼级数逼近以避免矩阵求逆而产生。为验证方法的有效性,我们将其应用于协变量偏移下深度神经网络的鲁棒特征学习问题,展示了该方法在该场景下的优势与优越性。