Human-in-the-loop reinforcement learning integrates human expertise to accelerate agent learning and provide critical guidance and feedback in complex fields. However, many existing approaches focus on single-agent tasks and require continuous human involvement during the training process, significantly increasing the human workload and limiting scalability. In this paper, we propose HARP (Human-Assisted Regrouping with Permutation Invariant Critic), a multi-agent reinforcement learning framework designed for group-oriented tasks. HARP integrates automatic agent regrouping with strategic human assistance during deployment, enabling and allowing non-experts to offer effective guidance with minimal intervention. During training, agents dynamically adjust their groupings to optimize collaborative task completion. When deployed, they actively seek human assistance and utilize the Permutation Invariant Group Critic to evaluate and refine human-proposed groupings, allowing non-expert users to contribute valuable suggestions. In multiple collaboration scenarios, our approach is able to leverage limited guidance from non-experts and enhance performance. The project can be found at https://github.com/huawen-hu/HARP.
翻译:人在回路强化学习通过整合人类专业知识来加速智能体学习,并在复杂领域中提供关键指导与反馈。然而,现有方法多集中于单智能体任务,且需在训练过程中持续的人类介入,这显著增加了人类工作量并限制了方法的可扩展性。本文提出HARP(基于置换不变批评器的人类辅助重组),一种面向群体任务的多智能体强化学习框架。HARP在部署阶段将自动智能体重组与策略性人类辅助相结合,使非专家用户能够以最小干预提供有效指导。训练过程中,智能体动态调整其分组以优化协作任务完成。部署时,智能体主动寻求人类协助,并利用置换不变群体批评器评估和优化人类提出的分组方案,从而使非专业用户也能贡献有价值的建议。在多种协作场景中,我们的方法能够利用非专家提供的有限指导有效提升性能。本项目开源地址为:https://github.com/huawen-hu/HARP。