DECOFFEE: Decentralized Reinforcement Learning for Time-critical Workload Offloading and Energy Efficiency across the Computing Continuum

The rapid proliferation of latency-sensitive and battery-constrained Internet-of-Things (IoT) applications has intensified the need for intelligent workload placement mechanisms across the Edge-Cloud computing continuum. In such environments, far-edge nodes must dynamically decide whether to execute workloads locally or offload them to neighboring nodes or the cloud, while accounting for execution delay, energy consumption, and strict timeout constraints. However, workload placement in large-scale distributed infrastructures is a highly dynamic and non-convex optimization problem due to stochastic arrivals, heterogeneous computing capacities, and time-varying network conditions. This paper proposes DECOFFEE, a decentralized reinforcement learning framework for time-critical workload offloading and energy-efficient operation across the computing continuum. The proposed multi-agent learning scheme jointly optimizes system delay, energy consumption, and workload drop rate through adaptive placement decisions. Each edge agent operates as an autonomous learning entity that derives an optimal policy from local system observations and predicted network conditions. The workload placement process is formulated as parallel Markov Decision Processes and solved using a Double Dueling Deep Q-Network (DQN) architecture enhanced with Long Short-Term Memory (LSTM) forecasting to anticipate future load conditions. Extensive simulations demonstrate that DECOFFEE and its variants consistently outperform conventional rule-based and heuristic placement strategies, achieving significant reductions in delay, energy consumption, and workload drop rate under varying traffic and network conditions.

翻译：延迟敏感且电池受限的物联网应用的快速普及，加剧了在边缘-云计算连续体中部署智能工作负载放置机制的需求。在此类环境下，远边缘节点需在执行延迟、能耗与严格超时约束的权衡下，动态决策是将工作负载本地执行、卸载至相邻节点或云端。然而，由于随机到达模式、异构计算能力及时变网络条件，大规模分布式基础设施中的工作负载放置本质上是一个高度动态的非凸优化问题。本文提出DECOFFEE——一种面向计算连续体中时间关键型工作负载卸载与能效运行的去中心化强化学习框架。该多智能体学习方案通过自适应放置决策联合优化系统延迟、能耗与工作负载丢弃率。每个边缘节点作为独立学习实体运行，基于本地系统观测与预测网络条件推导最优策略。工作负载放置过程被建模为并行马尔可夫决策过程，并通过双竞争深度Q网络架构（集成长短期记忆预测机制以预判未来负载状态）进行求解。大量仿真表明，DECOFFEE及其变体在多变流量与网络条件下，持续优于传统基于规则与启发式的放置策略，在延迟、能耗及工作负载丢弃率方面实现显著降低。