With the uptake of intelligent data-driven applications, edge computing infrastructures necessitate a new generation of admission control algorithms to maximize system performance under limited and highly heterogeneous resources. In this paper, we study how to optimally select information flows which belong to different classes and dispatch them to multiple edge servers where deployed applications perform flow analytic tasks. The optimal policy is obtained via constrained Markov decision process (CMDP) theory accounting for the demand of each edge application for specific classes of flows, the constraints on computing capacity of edge servers and of the access network. We develop DR-CPO, a specialized primal-dual Safe Reinforcement Learning (SRL) method which solves the resulting optimal admission control problem by reward decomposition. DR-CPO operates optimal decentralized control and mitigates effectively state-space explosion while preserving optimality. Compared to existing Deep Reinforcement Learning (DRL) solutions, extensive results show that DR-CPO achieves 15\% higher reward on a wide variety of environments, while requiring on average only 50\% of the amount of learning episodes to converge. Finally, we show how to match DR-CPO and load-balancing to dispatch optimally information streams to available edge servers and further improve system performance.
翻译:随着智能数据驱动应用的普及,边缘计算基础设施需要新一代接纳控制算法,以在有限且高度异构的资源下最大化系统性能。本文研究如何最优地选择属于不同类别的信息流,并将其分发至多个部署了应用进行流分析任务的边缘服务器。最优策略通过约束马尔可夫决策过程理论获得,该理论考虑了每个边缘应用对特定类别流的需求、边缘服务器计算能力及接入网络的约束。我们开发了DR-CPO,一种专用的原始对偶安全强化学习方法,通过奖励分解求解由此产生的最优接纳控制问题。DR-CPO实现最优分散控制,有效缓解状态空间爆炸同时保持最优性。与现有深度强化学习解决方案相比,大量实验结果表明,DR-CPO在各种环境下奖励提升15%,且平均仅需50%的学习回合即可收敛。最后,我们展示了如何将DR-CPO与负载均衡相结合,以最优方式将信息流分发至可用边缘服务器,并进一步提升系统性能。