To solve complex tasks under resource constraints, reinforcement learning (RL) agents need to be simple, efficient, and scalable, addressing (1) large state spaces and (2) the continuous accumulation of interaction data. We propose HyperAgent, an RL framework featuring the hypermodel and index sampling schemes that enable computation-efficient incremental approximation for the posteriors associated with general value functions without the need for conjugacy, and data-efficient action selection. Implementing HyperAgent is straightforward, requiring only one additional module beyond what is necessary for Double-DQN. HyperAgent stands out as the first method to offer robust performance in large-scale deep RL benchmarks while achieving provably scalable per-step computational complexity and attaining sublinear regret under tabular assumptions. HyperAgent can solve Deep Sea hard exploration problems with episodes that optimally scale with problem size and exhibits significant efficiency gains in both data and computation under the Atari benchmark. The core of our theoretical analysis is the sequential posterior approximation argument, enabled by the first analytical tool for sequential random projection -- a non-trivial martingale extension of the Johnson-Lindenstrauss. This work bridges the theoretical and practical realms of RL, establishing a new benchmark for RL algorithm design.
翻译:为在资源受限条件下解决复杂任务,强化学习智能体需具备简单性、高效性与可扩展性,以应对(1)大规模状态空间及(2)交互数据的持续累积挑战。我们提出HyperAgent框架,其核心包含超级模型与索引采样方案:在不依赖共轭性的前提下,实现对通用价值函数后验的计算高效增量近似,并支持数据高效的动作选择。HyperAgent的实现十分简便,仅需在Double-DQN基础上增加一个额外模块。作为首个在大规模深度强化学习基准测试中既能维持稳健性能,又能实现可证明的逐步计算复杂度可扩展性,并在表格假设下达到亚线性遗憾的方法,HyperAgent可解决与问题规模最优扩展的深度海洋硬探索难题,并在Atari基准测试中展现出显著的数据与计算效率提升。我们理论分析的核心是序列后验近似论证,其实现依赖于首个用于序列随机投影的分析工具——约翰逊-林登斯特劳斯引理的非平凡鞅扩展。本研究弥合了强化学习理论与实践的鸿沟,为强化学习算法设计树立了新基准。