To solve complex tasks under resource constraints, reinforcement learning (RL) agents need to be simple, efficient, and scalable, addressing (1) large state spaces and (2) the continuous accumulation of interaction data. We propose HyperAgent, an RL framework featuring the hypermodel and index sampling schemes that enable computation-efficient incremental approximation for the posteriors associated with general value functions without the need for conjugacy, and data-efficient action selection. Implementing HyperAgent is straightforward, requiring only one additional module beyond what is necessary for Double-DQN. HyperAgent stands out as the first method to offer robust performance in large-scale deep RL benchmarks while achieving provably scalable per-step computational complexity and attaining sublinear regret under tabular assumptions. HyperAgent can solve Deep Sea hard exploration problems with episodes that optimally scale with problem size and exhibits significant efficiency gains in both data and computation under the Atari benchmark. The core of our theoretical analysis is the sequential posterior approximation argument, enabled by the first analytical tool for sequential random projection -- a non-trivial martingale extension of the Johnson-Lindenstrauss. This work bridges the theoretical and practical realms of RL, establishing a new benchmark for RL algorithm design.
翻译:为解决资源约束下的复杂任务,强化学习代理需具备简单性、高效性和可扩展性,以应对(1)大规模状态空间和(2)交互数据的持续积累。我们提出HyperAgent,该框架采用超模型(hypermodel)与索引抽样方案(index sampling),能够在不依赖共轭性的前提下,对通用价值函数相关的后验分布实现计算高效的增量近似,并支持数据高效的动作选择。HyperAgent的实现简便易行:仅需在Double-DQN所需模块基础上增加一个额外模块即可。作为首个在大规模深度强化学习基准测试中展现稳健性能的方法,HyperAgent同时实现了可证明的渐进单步计算复杂度可控,并在表格假设下达到次线性遗憾界。该框架可解决需以问题规模最优缩放幕次(episodes)的深海探索难题,并在Atari基准测试中展现出显著的数据与计算效率优势。理论分析的核心在于序列后验近似论证(sequential posterior approximation argument),其依赖于首个面向序列随机投影的分析工具——Johnson-Lindenstrauss引理的非平凡鞅扩展。本研究弥合了强化学习理论领域与实践领域的鸿沟,为算法设计建立了新基准。