Simulation based solvers for optimal stopping problems must discretize the stopping decision. Under classical dynamic programming, a coarse exercise grid with only a few stopping opportunities can materially undervalue the optimal expected reward, whereas on a very fine grid, approximation errors accumulate through the backward recursion. To remove this limitation, we develop a new reinforcement-learning inspired algorithm that enables us to learn the exercise rule at arbitrarily fine time resolution. Our CARLOS (Continuous-time Adaptive Reinforcement Learning for Optimal Stopping) algorithm utilizes an aggregate deep neural network (ADNN) to learn a joint space-time decision boundary. Starting from a coarse time grid, we progressively increase the frequency of stopping opportunities, while in parallel training the ADNN to refine its timing-value estimates. We moreover design an adaptive sampling strategy that gradually concentrates training effort near the stopping boundary. Benchmarked results show that CARLOS delivers higher prices than existing Bermudan solvers, approaching the American upper bound, and achieves high computational efficiency relative to non-RL comparators.
翻译:针对最优停止问题的仿真求解器必须对停止决策进行离散化处理。在经典动态规划框架下,稀疏的停息网格仅有少量停止机会会实质性低估最优期望回报,而在极细网格上,近似误差会通过反向递归不断累积。为突破这一局限,我们提出了一种受强化学习启发的新型算法,能够在任意精细的时间分辨率下学习停止规则。我们的CARLOS(连续时间自适应强化学习最优停止)算法利用聚合深度神经网络学习联合时空决策边界。从粗时间网格出发,我们逐步增加停止机会的频率,同时并行训练ADNN以优化其时序价值估计。此外,我们设计了一种自适应采样策略,能够将训练资源逐步集中于停止边界附近。基准测试结果表明,CARLOS可获得高于现有百慕大求解器的定价,接近美式期权上界,并在计算效率上显著优于非强化学习对比方法。