In this paper, we study the conditional stochastic optimization (CSO) problem which covers a variety of applications including portfolio selection, reinforcement learning, robust learning, causal inference, etc. The sample-averaged gradient of the CSO objective is biased due to its nested structure, and therefore requires a high sample complexity for convergence. We introduce a general stochastic extrapolation technique that effectively reduces the bias. We show that for nonconvex smooth objectives, combining this extrapolation with variance reduction techniques can achieve a significantly better sample complexity than the existing bounds. Additionally, we develop new algorithms for the finite-sum variant of the CSO problem that also significantly improve upon existing results. Finally, we believe that our debiasing technique has the potential to be a useful tool for addressing similar challenges in other stochastic optimization problems.
翻译:本文研究条件随机优化(CSO)问题,该问题涵盖投资组合选择、强化学习、鲁棒学习、因果推断等多种应用。由于CSO目标函数的样本平均梯度具有嵌套结构而产生偏差,因此需要较高的样本复杂度才能收敛。我们提出一种通用随机外推技术,可有效降低偏差。我们证明,对于非凸光滑目标函数,将该外推技术与方差缩减技术相结合,能实现显著优于现有界的样本复杂度。此外,我们针对CSO问题的有限和变体开发了新算法,这些算法同样显著改进了现有结果。最后,我们相信,我们的去偏技术有潜力成为解决其他随机优化问题中类似挑战的有效工具。