Exploration and Anti-Exploration with Distributional Random Network Distillation

Exploration remains a critical issue in deep reinforcement learning for an agent to attain high returns in unknown environments. Although the prevailing exploration Random Network Distillation (RND) algorithm has been demonstrated to be effective in numerous environments, it often needs more discriminative power in bonus allocation. This paper highlights the "bonus inconsistency" issue within RND, pinpointing its primary limitation. To address this issue, we introduce the Distributional RND (DRND), a derivative of the RND. DRND enhances the exploration process by distilling a distribution of random networks and implicitly incorporating pseudo counts to improve the precision of bonus allocation. This refinement encourages agents to engage in more extensive exploration. Our method effectively mitigates the inconsistency issue without introducing significant computational overhead. Both theoretical analysis and experimental results demonstrate the superiority of our approach over the original RND algorithm. Our method excels in challenging online exploration scenarios and effectively serves as an anti-exploration mechanism in D4RL offline tasks. Our code is publicly available at https://github.com/yk7333/DRND.

翻译：探索仍是深度强化学习中智能体在未知环境中获取高回报的关键问题。尽管主流的探索性随机网络蒸馏（RND）算法已在众多环境中展现出有效性，但其在奖励分配环节常缺乏足够的区分能力。本文揭示了RND中的"奖励不一致"问题，并指出其根本局限性。为解决该问题，我们提出分布随机网络蒸馏（DRND）——RND的衍生算法。DRND通过蒸馏随机网络的分布并隐式集成伪计数机制提升奖励分配的精确性，从而优化探索过程。这种改进促使智能体进行更广泛的探索。该方法在未引入显著计算开销的前提下有效缓解了奖励不一致问题。理论分析与实验结果均表明，本方法优于原始RND算法。该方法在具有挑战性的在线探索场景中表现卓越，并在D4RL离线任务中有效充当反探索机制。我们的代码已开源至https://github.com/yk7333/DRND。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日