We investigate efficient exploration strategies of environments with unknown stochastic dynamics and sparse rewards. Specifically, we analyze first the impact of parallel simulations on the probability of reaching rare states within a finite time budget. Using simplified models based on random walks and Lévy processes, we provide analytical results that demonstrate a phase transition in reaching probabilities as a function of the number of parallel simulations. We identify an optimal number of parallel simulations that balances exploration diversity and time allocation. Additionally, we analyze a restarting mechanism that exponentially enhances the probability of success by redirecting efforts toward more promising regions of the state space. Our findings contribute to a more qualitative and quantitative theory of some exploration schemes in reinforcement learning, offering insights into developing more efficient strategies for environments characterized by rare events.
翻译:本研究针对具有未知随机动态与稀疏奖励的环境,探索高效探索策略。具体而言,我们首先分析了并行仿真对有限时间预算内抵达稀有状态概率的影响。通过基于随机游走与 Lévy 过程的简化模型,我们提供了解析结果,证明抵达概率随并行仿真数量变化存在相变现象。我们确定了平衡探索多样性与时间分配的最优并行仿真数量。此外,我们分析了一种重启机制,该机制通过将计算资源重新导向状态空间中更具潜力的区域,以指数方式提升成功概率。我们的研究结果有助于建立强化学习中某些探索方案更定性与定量的理论,为开发针对稀有事件特征环境的高效策略提供了新的见解。