We uncover a surprising phenomenon in deep reinforcement learning: training a diverse ensemble of data-sharing agents -- a well-established exploration strategy -- can significantly impair the performance of the individual ensemble members when compared to standard single-agent training. Through careful analysis, we attribute the degradation in performance to the low proportion of self-generated data in the shared training data for each ensemble member, as well as the inefficiency of the individual ensemble members to learn from such highly off-policy data. We thus name this phenomenon the curse of diversity. We find that several intuitive solutions -- such as a larger replay buffer or a smaller ensemble size -- either fail to consistently mitigate the performance loss or undermine the advantages of ensembling. Finally, we demonstrate the potential of representation learning to counteract the curse of diversity with a novel method named Cross-Ensemble Representation Learning (CERL) in both discrete and continuous control domains. Our work offers valuable insights into an unexpected pitfall in ensemble-based exploration and raises important caveats for future applications of similar approaches.
翻译:我们揭示了深度强化学习中一个令人惊讶的现象:训练一个多样化的数据共享智能体集成——一种成熟的探索策略——与标准的单智能体训练相比,会显著损害各个集成成员的性能。通过仔细分析,我们将性能下降归因于每个集成成员共享训练数据中自生成数据的比例较低,以及各集成成员从这种高度离策略数据中学习的低效性。因此,我们将此现象命名为“多样性诅咒”。我们发现,几种直观的解决方案——例如更大的回放缓冲区或更小的集成规模——要么无法持续缓解性能损失,要么削弱了集成的优势。最后,我们展示了表示学习在对抗多样性诅咒方面的潜力,提出了一种名为“跨集成表示学习”(CERL)的新方法,该方法在离散和连续控制领域均有效。我们的工作为基于集成的探索中一个意想不到的陷阱提供了宝贵的见解,并为未来类似方法的应用提出了重要的警示。