In this paper, we introduce a model-based deep-learning approach to solve finite-horizon continuous-time stochastic control problems with jumps. We iteratively train two neural networks: one to represent the optimal policy and the other to approximate the value function. Leveraging a continuous-time version of the dynamic programming principle, we derive two different training objectives based on the Hamilton-Jacobi-Bellman equation, ensuring that the networks capture the underlying stochastic dynamics. Empirical evaluations on different problems illustrate the accuracy and scalability of our approach, demonstrating its effectiveness in solving complex, high-dimensional stochastic control tasks.
翻译:本文提出了一种基于模型的深度学习方法,用于求解具有跳跃的有限时域连续时间随机控制问题。我们迭代训练两个神经网络:一个用于表示最优策略,另一个用于逼近值函数。利用动态规划原理的连续时间形式,我们基于 Hamilton-Jacobi-Bellman 方程推导出两种不同的训练目标,确保网络能够捕捉底层的随机动态。在不同问题上的实证评估表明了本方法的准确性和可扩展性,验证了其在求解复杂高维随机控制任务中的有效性。