We consider stochastic optimization problems involving an expected value of a nonlinear function of a base random vector and a conditional expectation of another function depending on the base random vector, a dependent random vector, and the decision variables. We call such problems conditional stochastic optimization problems. They arise in many applications, such as uplift modeling, reinforcement learning, and contextual optimization. We propose a specialized single time-scale stochastic method for nonconvex constrained conditional stochastic optimization problems with a Lipschitz smooth outer function and a generalized differentiable inner function. In the method, we approximate the inner conditional expectation with a rich parametric model whose mean squared error satisfies a stochastic version of a {\L}ojasiewicz condition. The model is used by an inner learning algorithm. The main feature of our approach is that unbiased stochastic estimates of the directions used by the method can be generated with one observation from the joint distribution per iteration, which makes it applicable to real-time learning. The directions, however, are not gradients or subgradients of any overall objective function. We prove the convergence of the method with probability one, using the method of differential inclusions and a specially designed Lyapunov function, involving a stochastic generalization of the Bregman distance. Finally, a numerical illustration demonstrates the viability of our approach.
翻译:我们考虑一类随机优化问题,其目标函数包含基于基本随机向量的非线性函数的期望,以及依赖于该基本随机向量、一个相关随机向量和决策变量的另一函数的条件期望。我们将这类问题称为条件随机优化问题。此类问题广泛出现在提升建模、强化学习和情境优化等应用场景中。针对具有Lipschitz光滑外函数和广义可微内函数的非凸带约束条件随机优化问题,我们提出了一种专用的单时间尺度随机方法。在该方法中,我们利用一个丰富的参数化模型来逼近内层的条件期望,并确保该模型的均方误差满足随机版本的Łojasiewicz条件。该模型由内层学习算法驱动。方法的核心优势在于:每步迭代仅需从联合分布中采样一次观测值,即可生成无偏的随机方向估计值,这使得该方法适用于实时学习。需强调的是,这些方向并非任何整体目标函数的梯度或次梯度。我们采用微分包含方法并引入包含Bregman距离随机推广的专属Lyapunov函数,证明了该方法以概率1收敛。最后,数值实验验证了该方法的可行性。