Carbon-aware decentralized dynamic task offloading in MIMO-MEC networks via multi-agent reinforcement learning

Massive internet of things microservices require integrating renewable energy harvesting into mobile edge computing (MEC) for sustainable eScience infrastructures. Spatiotemporal mismatches between stochastic task arrivals and intermittent green energy along with complex inter-user interference in multi-antenna (MIMO) uplinks complicate real-time resource management. Traditional centralized optimization and off-policy reinforcement learning struggle with scalability and signaling overhead in dense networks. This paper proposes CADDTO-PPO, a carbon-aware decentralized dynamic task offloading framework based on multi-agent proximal policy optimization. The multi-user MIMO-MEC system is modeled as a Decentralized Partially Observable Markov Decision Process (DEC-POMDP) to jointly minimize carbon emissions and buffer latency and energy wastage. A scalable architecture utilizes decentralized execution with parameter sharing (DEPS), which enables autonomous IoT agents to make fine-grained power control and offloading decisions based solely on local observations. Additionally, a carbon-first reward structure adaptively prioritizes green time slots for data transmission to decouple system throughput from grid-dependent carbon footprints. Finally, experimental results demonstrate CADDTO-PPO outperforms deep deterministic policy gradient (DDPG) and lyapunov-based baselines. The framework achieves the lowest carbon intensity and maintains near-zero packet overflow rates under extreme traffic loads. Architectural profiling validates the framework to demonstrate a constant $O(1)$ inference complexity and theoretical lightweight feasibility for future generation sustainable IoT deployments.

翻译：海量物联网微服务需将可再生能源收集与移动边缘计算（MEC）相结合，以构建可持续的电子科学基础设施。随机任务到达与间歇性绿色能源间的时空失配，加之多天线（MIMO）上行链路中复杂的用户间干扰，使得实时资源管理变得复杂。传统集中式优化与离线策略强化学习在密集网络中面临可扩展性与信令开销的挑战。本文提出CADDTO-PPO，一种基于多智能体近端策略优化的碳感知去中心化动态任务卸载框架。该系统将多用户MIMO-MEC建模为去中心化部分可观测马尔可夫决策过程（DEC-POMDP），以协同优化碳排放、缓冲延迟及能源浪费。该可扩展架构采用参数共享的去中心化执行（DEPS）机制，使物联网智能体能够仅依据局部观测自主做出细粒度功率控制与卸载决策。此外，一种碳优先奖励结构自适应地优先分配绿色时隙进行数据传输，从而将系统吞吐量与依赖电网的碳足迹解耦。实验结果表明，CADDTO-PPO在性能上优于深度确定性策略梯度（DDPG）与基于李雅普诺夫的基线方法。该框架实现了最低的碳强度，并在极端流量负载下保持接近零的数据包溢出率。架构性能分析验证了该框架具有恒定的$O(1)$推理复杂度，为未来可持续物联网部署提供了理论上的轻量化可行性。