In this paper, we study the conditional stochastic optimization (CSO) problem which covers a variety of applications including portfolio selection, reinforcement learning, robust learning, causal inference, etc. The sample-averaged gradient of the CSO objective is biased due to its nested structure and therefore requires a high sample complexity to reach convergence. We introduce a general stochastic extrapolation technique that effectively reduces the bias. We show that for nonconvex smooth objectives, combining this extrapolation with variance reduction techniques can achieve a significantly better sample complexity than existing bounds. We also develop new algorithms for the finite-sum variant of CSO that also significantly improve upon existing results. Finally, we believe that our debiasing technique could be an interesting tool applicable to other stochastic optimization problems too.
翻译:本文研究条件随机优化(CSO)问题,该问题涵盖多种应用领域,包括投资组合选择、强化学习、鲁棒学习、因果推断等。由于CSO目标函数的嵌套结构,其样本平均梯度存在偏差,因此需要较高的样本复杂度才能实现收敛。我们提出一种通用的随机外推技术,可有效降低该偏差。研究表明,对于非凸光滑目标函数,将该外推技术与方差缩减技术相结合,能够获得比现有界显著更优的样本复杂度。我们还针对有限和形式的CSO问题开发了新算法,同样大幅改进了现有结果。最后,我们相信所提出的去偏技术可作为有趣工具,适用于其他随机优化问题。