We develop horizon-aware anytime-valid tests and confidence sequences for bounded means under a strict deadline $N$. Using the betting/e-process framework, we cast horizon-aware betting as a finite-horizon optimal control problem with state space $(t, \log W_t)$, where $t$ is the time and $W_t$ is the test martingale value. We first show that in certain interior regions of the state space, policies that deviate significantly from Kelly betting are provably suboptimal, while Kelly betting reaches the threshold with high probability. We then identify sufficient conditions showing that outside this region, more aggressive betting than Kelly can be better if the bettor is behind schedule, and less aggressive can be better if the bettor is ahead. Taken together these results suggest a simple phase diagram in the $(t, \log W_t)$ plane, delineating regions where Kelly, fractional Kelly, and aggressive betting may be preferable. Guided by this phase diagram, we introduce a Deep Reinforcement Learning approach based on a universal Deep Q-Network (DQN) agent that learns a single policy from synthetic experience and maps simple statistics of past observations to bets across horizons and null values. In limited-horizon experiments, the learned DQN policy yields state-of-the-art results.
翻译:我们针对严格截止时间 $N$ 下的有界均值,开发了截止时间感知的随时有效检验与置信序列。基于投注/反鞅过程框架,我们将截止时间感知投注建模为状态空间为 $(t, \log W_t)$ 的有限时域最优控制问题,其中 $t$ 表示时间,$W_t$ 为检验鞅值。首先证明:在状态空间的某些内部区域中,显著偏离凯利投注的策略可证明是次优的,而凯利投注能以高概率达到阈值。继而识别出充分条件表明:在该区域之外,若投注者落后于计划进度,采用比凯利更激进的投注可能更优;若领先于进度,采用更保守的投注可能更优。综上结果揭示了 $(t, \log W_t)$ 平面上的简单相图,划分出凯利投注、分数凯利投注与激进投注各自可能占优的区域。基于该相图指引,我们提出基于通用深度Q网络(DQN)智能体的深度强化学习方法:该智能体从合成经验中学习单一策略,将历史观测的简单统计量映射为跨不同时域及零假设的投注决策。在有限时域实验中,学习得到的DQN策略取得了当前最优结果。