Recent works have proposed accelerating the wall-clock training time of actor-critic methods via the use of large-scale environment parallelization; unfortunately, these can sometimes still require large number of environment interactions to achieve a desired level of performance. Noting that well-structured representations can improve the generalization and sample efficiency of deep reinforcement learning (RL) agents, we propose the use of simplicial embeddings: lightweight representation layers that constrain embeddings to simplicial structures. This geometric inductive bias results in sparse and discrete features that stabilize critic bootstrapping and strengthen policy gradients. When applied to FastTD3, FastSAC, and PPO, simplicial embeddings consistently improve sample efficiency and final performance across a variety of continuous- and discrete-control environments, without any loss in runtime speed.
翻译:近期研究提出通过大规模环境并行化来加速行动者-批评者方法的实际训练时间;然而,这些方法有时仍需要大量环境交互才能达到预期性能水平。注意到结构良好的表征能够提升深度强化学习智能体的泛化能力与样本效率,我们提出采用单纯形嵌入:这种轻量级表征层将嵌入约束于单纯形结构。这种几何归纳偏置会产生稀疏且离散的特征,从而稳定批评者自举过程并增强策略梯度。当应用于FastTD3、FastSAC和PPO算法时,单纯形嵌入在多种连续与离散控制环境中持续提升样本效率与最终性能,且未造成运行时速度损失。