We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement learning (RL). More specifically, AIRS selects shaping function from a predefined set based on the estimated task return in real-time, providing reliable exploration incentives and alleviating the biased objective problem. Moreover, we develop an intrinsic reward toolkit to provide efficient and reliable implementations of diverse intrinsic reward approaches. We test AIRS on various tasks of MiniGrid, Procgen, and DeepMind Control Suite. Extensive simulation demonstrates that AIRS can outperform the benchmarking schemes and achieve superior performance with simple architecture.
翻译:我们提出AIRS:一种能够智能且自适应地为强化学习(RL)中的探索提供高质量内在奖励的自动内在奖励塑造方法。具体而言,AIRS基于实时估计的任务回报,从预定义函数集合中选择奖励塑造函数,从而提供可靠的探索激励并缓解目标偏差问题。此外,我们开发了一个内在奖励工具包,用于提供多种内在奖励方法的高效且可靠的实现。我们在MiniGrid、Procgen和DeepMind Control Suite的各类任务上测试了AIRS。大量仿真实验表明,AIRS能够超越基准方案,并以简洁的架构实现优越性能。