In this work, we propose a novel algorithmic framework for data sharing and coordinated exploration for the purpose of learning more data-efficient and better performing policies under a concurrent reinforcement learning (CRL) setting. In contrast to other work which make the assumption that all agents act under identical environments, we relax this restriction and instead consider the formulation where each agent acts within an environment which shares a global structure but also exhibits individual variations. Our algorithm leverages a causal inference algorithm in the form of Additive Noise Model - Mixture Model (ANM-MM) in extracting model parameters governing individual differentials via independence enforcement. We propose a new data sharing scheme based on a similarity measure of the extracted model parameters and demonstrate superior learning speeds on a set of autoregressive, pendulum and cart-pole swing-up tasks and finally, we show the effectiveness of diverse action selection between common agents under a sparse reward setting. To the best of our knowledge, this is the first work in considering non-identical environments in CRL and one of the few works which seek to integrate causal inference with reinforcement learning (RL).
翻译:本文提出了一种新颖的算法框架,旨在通过并发强化学习(CRL)设置下的数据共享与协调探索,学习更具数据效率且性能更优的策略。与假设所有智能体在完全相同环境中工作的现有研究不同,我们放宽了这一限制,转而考虑每个智能体在共享全局结构但存在个体差异的环境中运行。我们的算法利用因果推断方法——加性噪声模型-混合模型(ANM-MM),通过独立性约束提取控制个体差异的模型参数。我们基于所提取模型参数的相似性度量提出了一种新的数据共享方案,并在自回归任务、摆杆平衡及倒立摆摆动任务中展示了更快的收敛速度。此外,我们在稀疏奖励场景下证明了多个智能体间多样化动作选择的有效性。据我们所知,这是首个在CRL中考虑非相同环境的研究,也是少数尝试将因果推断与强化学习(RL)相结合的工作之一。