We study the problem of computing the value function from a discretely-observed trajectory of a continuous-time diffusion process. We develop a new class of algorithms based on easily implementable numerical schemes that are compatible with discrete-time reinforcement learning (RL) with function approximation. We establish high-order numerical accuracy as well as the approximation error guarantees for the proposed approach. In contrast to discrete-time RL problems where the approximation factor depends on the effective horizon, we obtain a bounded approximation factor using the underlying elliptic structures, even if the effective horizon diverges to infinity.
翻译:我们研究了从连续时间扩散过程的离散观测轨迹中计算价值函数的问题。我们开发了一类基于易于实现的数值格式的新算法,这些算法与具有函数逼近的离散时间强化学习(RL)兼容。我们为所提出的方法建立了高阶数值精度保证以及逼近误差保证。与离散时间RL问题中逼近因子依赖于有效时域不同,我们利用底层椭圆结构获得了有界逼近因子,即使有效时域发散至无穷大。