In edge computing systems, autonomous agents must make fast local decisions while competing for shared resources. Existing MARL methods often resume to centralized critics or frequent communication, which fail under limited observability and communication constraints. We propose a decentralized framework in which each agent solves a constrained Markov decision process (CMDP), coordinating implicitly through a shared constraint vector. For the specific case of offloading, e.g., constraints prevent overloading shared server resources. Coordination constraints are updated infrequently and act as a lightweight coordination mechanism. They enable agents to align with global resource usage objectives but require little direct communication. Using safe reinforcement learning, agents learn policies that meet both local and global goals. We establish theoretical guarantees under mild assumptions and validate our approach experimentally, showing improved performance over centralized and independent baselines, especially in large-scale settings.
翻译:在边缘计算系统中,自主智能体必须在竞争共享资源的同时做出快速的本地决策。现有的多智能体强化学习方法通常依赖于集中式评价器或频繁通信,在有限可观测性和通信约束下往往失效。我们提出一种去中心化框架,其中每个智能体求解一个约束马尔可夫决策过程,通过共享约束向量进行隐式协调。以任务卸载为例,约束条件可防止共享服务器资源过载。协调约束的更新频率较低,作为一种轻量级协调机制,使智能体能够与全局资源使用目标保持一致,同时几乎不需要直接通信。通过安全强化学习,智能体可习得同时满足本地与全局目标的策略。我们在温和假设下建立了理论保证,并通过实验验证了所提方法,结果表明其性能优于集中式与独立基线方法,尤其在大规模场景中表现更为突出。