HiRL: Hierarchical Reinforcement Learning for Coordinated Resource Management in Heterogeneous Edge Computing

Edge computing faces unprecedented resource orchestration challenges from multi-dimensional heterogeneity across device architectures, diverse task requirements in CPU-intensive, GPU-intensive, I/O-intensive, and dynamic network conditions. The edge environments demand real-time task processing within strict energy budgets, yet conventional approaches struggle with mixed continuous-discrete optimization while meeting deadline and energy constraints. This paper presents HiRL, a hierarchical reinforcement learning framework that decomposes complex resource orchestration into coordinated power control and task allocation decisions. Our approach separates continuous power management using the Twin Delayed Deep Deterministic Policy Gradient (TD3) and discrete task placement using Double Deep Q-Network (DDQN), unified through a coordination engine with five-dimensional queue state representation. We propose a heterogeneous assessment of resource compatibility with deadline-oriented prioritization and failure-penalized adaptive sampling to enhance decision quality under resource constraints. To improve practical applicability, the framework models comprehensive system dynamics including device mobility, queue congestion patterns, infrastructure heterogeneity, and priority-sensitive scheduling demands. Experimental results show that HiRL achieves effective latency-energy trade-offs with 28% latency reduction compared to Single-DDQN and maintains nearly 100% task completion rates under all load conditions. Compared to baseline algorithms, HiRL reduces energy consumption by up to 51% under low load while achieving 24% better latency performance than static optimization approaches under high load, establishing effective resource orchestration in heterogeneous edge environments.

翻译：边缘计算面临来自设备架构多维异构性、CPU密集型/GPU密集型/I/O密集型多样化任务需求以及动态网络条件的空前资源编排挑战。边缘环境要求在严格能耗预算内实现实时任务处理，而传统方法难以在满足截止时间和能量约束的同时解决混合连续-离散优化问题。本文提出HiRL——一种分层强化学习框架，将复杂资源编排分解为协调的功率控制与任务分配决策。本方法采用双延迟深度确定性策略梯度(TD3)处理连续功率管理，双深度Q网络(DDQN)处理离散任务放置，通过包含五维队列状态表示的协调引擎统一管理。我们提出资源兼容性异构评估方法，结合截止时间导向的优先级排序与失败惩罚自适应采样，以增强资源约束下的决策质量。为提升实际应用性，该框架建模包含设备移动性、队列拥塞模式、基础设施异构性及优先级敏感调度需求的综合系统动态特性。实验结果表明，HiRL实现了有效的延迟-能耗权衡：相比单DDQN降低28%延迟，且在各类负载条件下保持近100%任务完成率。与基准算法相比，HiRL在低负载时最多降低51%能耗，高负载时相比静态优化方法实现24%延迟性能提升，从而在异构边缘环境中建立有效的资源编排机制。