This paper addresses the challenge of allocating heterogeneous resources among multiple agents in a decentralized manner. Our proposed method, Liquid-Graph-Time Clustering-IPPO, builds upon Independent Proximal Policy Optimization (IPPO) by integrating dynamic cluster consensus, a mechanism that allows agents to form and adapt local sub-teams based on resource demands. This decentralized coordination strategy reduces reliance on global information and enhances scalability. We evaluate LGTC-IPPO against standard multi-agent reinforcement learning baselines and a centralized expert solution across a range of team sizes and resource distributions. Experimental results demonstrate that LGTC-IPPO achieves more stable rewards, better coordination, and robust performance even as the number of agents or resource types increases. Additionally, we illustrate how dynamic clustering enables agents to reallocate resources efficiently also for scenarios with discharging resources.
翻译:本文解决了在去中心化方式下为多个智能体分配异构资源的挑战。我们提出的方法Liquid-Graph-Time Clustering-IPPO建立在独立近端策略优化(IPPO)的基础上,通过集成动态集群共识机制,该机制允许智能体根据资源需求形成并调整局部子团队。这种去中心化协调策略减少了对全局信息的依赖,并增强了可扩展性。我们在一系列团队规模和资源分布场景下,将LGTC-IPPO与标准多智能体强化学习基线以及集中式专家解决方案进行了比较评估。实验结果表明,即使智能体数量或资源类型增加,LGTC-IPPO也能实现更稳定的奖励、更好的协调性和鲁棒的性能。此外,我们还阐述了动态聚类如何使智能体在资源消耗型场景中也能高效地重新分配资源。