Multi-agent reinforcement learning is a key method for training multi-robot systems. Through rewarding or punishing robots over a series of episodes according to their performance, they can be trained and then deployed in the real world. However, poorly trained policies can lead to unsafe behaviour during early training stages. We introduce Multi-Agent Reinforcement Learning guided by language-based Inter-robot Negotiation (MARLIN), a hybrid framework in which large language models provide high-level planning before the reinforcement learning policy has learned effective behaviours. Robots use language models to negotiate actions and generate plans that guide policy learning. The system dynamically switches between reinforcement learning and language-model-based negotiation during training, enabling safer and more effective exploration. MARLIN is evaluated using both simulated and physical robots with local and remote language models. Results show that, compared to standard multi-agent reinforcement learning, the hybrid approach achieves higher performance in early training without reducing final performance. The code is available at https://github.com/SooratiLab/MARLIN.
翻译:多智能体强化学习是训练多机器人系统的关键技术。通过根据机器人序列任务中的表现给予奖惩,可使其完成训练并部署于真实环境。然而,早期训练阶段中未经充分优化的策略可能导致不安全行为。本文提出基于语言交互协商的多智能体强化学习框架(MARLIN),该混合架构在强化学习策略尚未掌握有效行为时,利用大语言模型进行高层规划。机器人通过语言模型协商行动并生成引导策略学习的规划方案。系统在训练过程中动态切换强化学习与基于语言模型的协商机制,实现更安全高效的探索。采用本地及远程语言模型,分别在仿真环境和实体机器人上对MARLIN进行评估。结果表明,与标准多智能体强化学习相比,该混合方法在早期训练阶段获得更高性能,且不降低最终性能。代码开源见:https://github.com/SooratiLab/MARLIN