Stochastic resetting, where a dynamical process is intermittently returned to a fixed reference state, has emerged as a powerful mechanism for optimizing first-passage properties. Existing theory largely treats static, non-learning processes. Here we ask how stochastic resetting interacts with reinforcement learning, where the underlying dynamics adapt through experience. In tabular grid environments, we find that resetting accelerates policy convergence even when it does not reduce the search time of a purely diffusive agent, indicating a novel mechanism beyond classical first-passage optimization. In a continuous control task with neural-network-based value approximation, we show that random resetting improves deep reinforcement learning when exploration is difficult and rewards are sparse. Unlike temporal discounting, resetting preserves the optimal policy while accelerating convergence by truncating long, uninformative trajectories to enhance value propagation. Our results establish stochastic resetting as a simple, tunable mechanism for accelerating learning, translating a canonical phenomenon of statistical mechanics into an optimization principle for reinforcement learning.
翻译:随机重置作为一种优化首次通过特性的有效机制,其核心思想是使动态过程间歇性地返回固定参考状态。现有理论主要针对静态、非学习过程进行研究。本文探讨随机重置如何与强化学习相互作用——在强化学习中,底层动态通过经验进行自适应调整。在表格化网格环境中,我们发现即使重置不减少纯扩散智能体的搜索时间,它仍能加速策略收敛,这表明其作用机制超越了经典的首次通过优化理论。在基于神经网络价值逼近的连续控制任务中,我们证明当探索难度较大且奖励稀疏时,随机重置能有效改善深度强化学习效果。与时序折扣不同,重置在保持最优策略的同时,通过截断冗长无效轨迹来增强价值传播,从而加速收敛过程。本研究将随机重置确立为一种可调节的简易学习加速机制,将统计力学中的经典现象转化为强化学习的优化原理。