Fair workload enforcement in heterogeneous multi-agent systems that pursue shared objectives remains challenging. Fixed fairness penalties often introduce inefficiencies, training instability, and conflicting agent incentives. Reward-shaping approaches in fair Multi-Agent Reinforcement Learning (MARL) typically incorporate fairness through heuristic penalties or scalar reward modifications and often rely on post-hoc evaluation. However, these methods do not guarantee that a desired fairness level will be satisfied. To address this limitation, we propose the Adaptive Fairness Multi-Agent Reinforcement Learning (AdaFair-MARL) framework, which formulates workload fairness as an explicit constraint so that agents maintain balanced contributions while optimizing team performance. We present AdaFair-MARL, a constrained cooperative MARL framework whose core algorithmic component is a primal-dual update that enforces workload fairness via adaptive Lagrange multiplier updates. Grounding the framework in a cooperative Markov game, we derive the fairness constraint from Jain's Fairness Index (JFI) geometry and show that the resulting feasible set admits a second-order cone representation, enabling principled Lagrangian dual-ascent updates without manual penalty tuning. Experiments in a simulated hospital coordination environment (MARLHospital) demonstrate the effectiveness of AdaFair-MARL compared to reward-shaping and fixed-penalty fairness methods, improving workload balance while maintaining team performance. We found that AdaFair-MARL achieves nearly perfect constraint satisfaction (0.99-1.00) while significantly improving workload fairness compared to fixed-penalty baselines.
翻译:在追求共同目标的异构多智能体系统中,公平任务分配的实现仍具有挑战性。固定公平性惩罚常导致效率低下、训练不稳定及智能体激励冲突。在多智能体强化学习(MARL)中,基于奖励塑形的公平性方法通常通过启发式惩罚或标量奖励修正引入公平性,并常依赖事后评估,但这些方法无法保证满足预期的公平性水平。为此,我们提出自适应公平性多智能体强化学习(AdaFair-MARL)框架,将任务公平性形式化为显式约束,使智能体在优化团队性能的同时维持均衡贡献。该约束型协作式MARL框架的核心算法组件采用原始-对偶更新机制,通过自适应拉格朗日乘子更新实现任务公平性约束。基于协作型马尔可夫博弈理论框架,我们从Jain公平指数(JFI)的几何结构推导出公平性约束,证明可行集可表示为二阶锥形式,从而无需手动调参即可实现规范的拉格朗日对偶上升更新。在模拟医院协同环境(MARLHospital)中的实验表明,相较于基于奖励塑形和固定惩罚的公平性方法,AdaFair-MARL在保持团队性能的同时显著提升了任务均衡性。研究发现,AdaFair-MARL实现了近乎完美的约束满足度(0.99-1.00),且相较于固定惩罚基线方法,其任务公平性提升效果显著。