Oberman gave a stochastic control formulation of the problem of estimating the convex envelope of a non-convex function. Based on this, we develop a reinforcement learning scheme to approximate the convex envelope, using a variant of Q-learning for controlled optimal stopping. It shows very promising results on a standard library of test problems.
翻译:Oberman 给出了非凸函数凸包络估计问题的随机控制形式。基于此,我们开发了一种强化学习方案来近似凸包络,该方案采用针对受控最优停止问题的 Q 学习变体。在标准测试问题库上的实验结果表明,该方法具有非常优越的性能。