The problem of high-dimensional path-dependent optimal stopping (OS) is important to multiple academic communities and applications. Modern OS tasks often have a large number of decision epochs, and complicated non-Markovian dynamics, making them especially challenging. Standard approaches, often relying on ADP, duality, deep learning and other heuristics, have shown strong empirical performance, yet have limited rigorous guarantees (which may scale exponentially in the problem parameters and/or require previous knowledge of basis functions or additional continuity assumptions). Although past work has placed these problems in the framework of computational complexity and polynomial-time approximability, those analyses were limited to simple one-dimensional problems. For long-horizon complex OS problems, is a polynomial time solution even theoretically possible? We prove that given access to an efficient simulator of the underlying information process, and fixed accuracy epsilon, there exists an algorithm that returns an epsilon-optimal solution (both stopping policies and approximate optimal values) with computational complexity scaling polynomially in the time horizon and underlying dimension. Like the first polynomial-time (approximation) algorithms for several other well-studied problems, our theoretical guarantees are polynomial yet impractical. Our approach is based on a novel expansion for the optimal value which may be of independent interest.
翻译:高维路径依赖的最优停止(OS)问题对多个学术领域及应用至关重要。现代最优停止任务通常包含大量决策时间点,并涉及复杂的非马尔可夫动力学,使其极具挑战性。依赖近似动态规划、对偶方法、深度学习及其他启发式算法的标准方法虽展现出强劲的实证性能,但理论保证有限(可能随问题参数呈指数级增长,或需预先已知基函数及附加连续性假设)。尽管已有工作将这些问题的计算复杂性与多项式时间可逼近性纳入分析框架,但相关研究局限于简单的一维问题。对于长跨度复杂最优停止问题,多项式时间解在理论上是否可能存在?我们证明:在能够有效模拟基础信息过程并给定固定精度ε的条件下,存在一种算法返回ε-最优解(包括停止策略与近似最优值),其计算复杂度随时间跨度和基础维度呈多项式增长。类似于多个经典问题中的首个多项式时间(近似)算法,本理论保证虽为多项式但实际应用性有限。我们的方法基于对最优值的新颖展开,该展开或具有独立的研究价值。