We propose HyperAgent, a reinforcement learning (RL) algorithm based on the hypermodel framework for exploration in RL. HyperAgent allows for the efficient incremental approximation of posteriors associated with an optimal action-value function ($Q^\star$) without the need for conjugacy and follows the greedy policies w.r.t. these approximate posterior samples. We demonstrate that HyperAgent offers robust performance in large-scale deep RL benchmarks. It can solve Deep Sea hard exploration problems with episodes that optimally scale with problem size and exhibits significant efficiency gains in the Atari suite. Implementing HyperAgent requires minimal code addition to well-established deep RL frameworks like DQN. We theoretically prove that, under tabular assumptions, HyperAgent achieves logarithmic per-step computational complexity while attaining sublinear regret, matching the best known randomized tabular RL algorithm.
翻译:我们提出 HyperAgent,一种基于超模型框架的强化学习算法,用于强化学习中的探索。HyperAgent 能够高效地增量逼近与最优动作价值函数相关联的后验分布,无需共轭性,并遵循相对于这些近似后验样本的贪婪策略。我们证明 HyperAgent 在大规模深度强化学习基准测试中提供了稳健的性能。它能够解决 Deep Sea 困难探索问题,其所需回合数随问题规模达到最优缩放,并在 Atari 套件中展现出显著的效率提升。实现 HyperAgent 仅需在成熟的深度强化学习框架(如 DQN)上添加极少的代码。我们从理论上证明,在表格化假设下,HyperAgent 实现了对数级的每步计算复杂度,同时达到次线性后悔,匹配了已知最优的随机化表格强化学习算法。