Under sparse extrinsic reward settings, reinforcement learning has remained challenging, despite surging interests in this field. Previous attempts suggest that intrinsic reward can alleviate the issue caused by sparsity. In this article, we present a novel intrinsic reward that is inspired by human learning, as humans evaluate curiosity by comparing current observations with historical knowledge. Our method involves training a self-supervised prediction model, saving snapshots of the model parameters, and using nuclear norm to evaluate the temporal inconsistency between the predictions of different snapshots as intrinsic rewards. We also propose a variational weighting mechanism to assign weight to different snapshots in an adaptive manner. Our experimental results on various benchmark environments demonstrate the efficacy of our method, which outperforms other intrinsic reward-based methods without additional training costs and with higher noise tolerance. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.
翻译:在稀疏外部奖励设置下,强化学习仍然具有挑战性,尽管该领域的研究兴趣日益高涨。以往的研究表明,内在奖励可以缓解由稀疏性引起的问题。本文提出了一种受人类学习启发的新型内在奖励,即人类通过比较当前观察与历史知识来评估好奇心。我们的方法包括训练一个自监督预测模型,保存模型参数的快照,并利用核范数评估不同快照预测之间的时间不一致性作为内在奖励。我们还提出了一种变分加权机制,以自适应方式为不同快照分配权重。在多种基准环境中的实验结果表明,我们的方法有效,且优于其他基于内在奖励的方法,无需额外训练成本且具有更高的噪声容忍度。本文已提交至IEEE待发表。版权可能未经通知转移,此后该版本可能无法再访问。