This paper addresses the challenge of allocating heterogeneous resources among multiple agents in a decentralized manner. Our proposed method, LGTC-IPPO, builds upon Independent Proximal Policy Optimization (IPPO) by integrating dynamic cluster consensus, a mechanism that allows agents to form and adapt local sub-teams based on resource demands. This decentralized coordination strategy reduces reliance on global information and enhances scalability. We evaluate LGTC-IPPO against standard multi-agent reinforcement learning baselines and a centralized expert solution across a range of team sizes and resource distributions. Experimental results demonstrate that LGTC-IPPO achieves more stable rewards, better coordination, and robust performance even as the number of agents or resource types increases. Additionally, we illustrate how dynamic clustering enables agents to reallocate resources efficiently also for scenarios with discharging resources.
翻译:本文解决了在去中心化环境下多个智能体之间分配异构资源的挑战。我们提出的方法LGTC-IPPO建立在独立近端策略优化(IPPO)的基础上,通过集成动态集群共识机制,使智能体能够根据资源需求形成并调整局部子团队。这种去中心化协调策略减少了对全局信息的依赖,并增强了可扩展性。我们在不同团队规模和资源分布场景下,将LGTC-IPPO与标准多智能体强化学习基线以及集中式专家解决方案进行了对比评估。实验结果表明,即使智能体数量或资源类型增加,LGTC-IPPO仍能获得更稳定的奖励、更好的协调性和鲁棒的性能。此外,我们还阐述了动态聚类机制如何使智能体在资源衰减场景中也能高效地重新分配资源。