In this paper, we study the conditional stochastic optimization (CSO) problem which covers a variety of applications including portfolio selection, reinforcement learning, robust learning, causal inference, etc. The sample-averaged gradient of the CSO objective is biased due to its nested structure, and therefore requires a high sample complexity to reach convergence. We introduce a general stochastic extrapolation technique that effectively reduces the bias. We show that for nonconvex smooth objectives, combining this extrapolation with variance reduction techniques can achieve a significantly better sample complexity than existing bounds. Additionally, we develop new algorithms for the finite-sum variant of the CSO problem that also significantly improve upon existing results. Finally, we believe that our debiasing technique has the potential to be a useful tool for addressing similar challenges in other stochastic optimization problems.
翻译:在本文中,我们研究条件随机优化(CSO)问题,该问题涵盖多种应用,包括投资组合选择、强化学习、鲁棒学习、因果推断等。由于CSO目标函数的嵌套结构,其基于样本平均的梯度存在偏差,因此需要较高的样本复杂度才能达到收敛。我们提出了一种通用的随机外推技术,能够有效降低偏差。我们证明,对于非凸光滑目标函数,将该外推技术与方差缩减技术相结合,可以获得比现有界限显著更优的样本复杂度。此外,我们为CSO问题的有限和变体开发了新算法,这些算法也显著改进了现有结果。最后,我们相信我们的去偏技术有潜力成为解决其他随机优化问题中类似挑战的有效工具。