The standard theory of optimal stopping is based on the idealised assumption that the underlying process is essentially known. In this paper, we drop this restriction and study data-driven optimal stopping for a general diffusion process, focusing on investigating the statistical performance of the proposed estimator of the optimal stopping barrier. More specifically, we derive non-asymptotic upper bounds on the simple regret, along with uniform and non-asymptotic PAC bounds. Minimax optimality is verified by completing the upper bound results with matching lower bounds on the simple regret. All results are shown both under general conditions on the payoff functions and under more refined assumptions that mimic the margin condition used in binary classification, leading to an improved rate of convergence. Additionally, we investigate how our results on the simple regret transfer to the cumulative regret for a specific exploration-exploitation strategy, both with respect to lower bounds and upper bounds.
翻译:最优停时理论基于理想化假设,即底层过程本质上是已知的。本文放宽这一限制,研究一般扩散过程的数据驱动最优停时,重点探讨所提出的最优停时阈值估计量的统计性能。具体而言,我们推导了简单遗憾的非渐近上界,以及一致非渐近PAC界。通过将上界结果与简单遗憾的匹配下界相结合,验证了极小化最优性。所有结果均在收益函数的一般条件下以及模拟二元分类中边际条件的更精细假设下给出,后者可带来更优的收敛速度。此外,我们研究了简单遗憾结果如何转化为特定探索-利用策略的累积遗憾,包括下界与上界两方面。