To solve complex tasks under resource constraints, reinforcement learning (RL) agents need to be simple, efficient, and scalable, addressing (1) large state spaces and (2) the continuous accumulation of interaction data. We propose HyperAgent, an RL framework featuring the hypermodel and index sampling schemes that enable computation-efficient incremental approximation for the posteriors associated with general value functions without the need for conjugacy, and data-efficient action selection. Implementing HyperAgent is straightforward, requiring only one additional module beyond what is necessary for Double-DQN. HyperAgent stands out as the first method to offer robust performance in large-scale deep RL benchmarks while achieving provably scalable per-step computational complexity and attaining sublinear regret under tabular assumptions. HyperAgent can solve Deep Sea hard exploration problems with episodes that optimally scale with problem size and exhibits significant efficiency gains in both data and computation under the Atari benchmark. The core of our theoretical analysis is the sequential posterior approximation argument, enabled by the first analytical tool for sequential random projection -- a non-trivial martingale extension of the Johnson-Lindenstrauss. This work bridges the theoretical and practical realms of RL, establishing a new benchmark for RL algorithm design.
翻译:为解决资源约束下的复杂任务,强化学习智能体需兼具简单性、高效性与可扩展性,以应对:(1) 大规模状态空间;(2) 交互数据的持续积累。我们提出HyperAgent框架,其核心为超模型与索引采样机制:一方面通过非共轭方式实现通用值函数后验的计算高效增量近似,另一方面实现数据高效的动作选择。HyperAgent的实现极为简洁,仅需在Double-DQN框架基础上增加一个模块。该框架是首个在大规模深度强化学习基准中展现稳健性能,同时达到每步计算复杂度可扩展性(表格假设下满足次线性遗憾界)的方法。HyperAgent能解决随问题规模最优缩放幕长度的Deep Sea困难探索问题,并在Atari基准测试中展现出显著的数据与计算效率优势。理论分析的核心是序贯后验近似论证——该论证基于首个序贯随机投影分析工具,即Johnson-Lindenstrauss引理的非平凡鞅扩展。本研究弥合了强化学习理论与实践的鸿沟,为算法设计树立了新基准。