With the uptake of intelligent data-driven applications, edge computing infrastructures necessitate a new generation of admission control algorithms to maximize system performance under limited and highly heterogeneous resources. In this paper, we study how to optimally select information flows which belong to different classes and dispatch them to multiple edge servers where deployed applications perform flow analytic tasks. The optimal policy is obtained via constrained Markov decision process (CMDP) theory accounting for the demand of each edge application for specific classes of flows, the constraints on computing capacity of edge servers and of the access network. We develop DR-CPO, a specialized primal-dual Safe Reinforcement Learning (SRL) method which solves the resulting optimal admission control problem by reward decomposition. DR-CPO operates optimal decentralized control and mitigates effectively state-space explosion while preserving optimality. Compared to existing Deep Reinforcement Learning (DRL) solutions, extensive results show that DR-CPO achieves 15\% higher reward on a wide variety of environments, while requiring on average only 50\% of the amount of learning episodes to converge. Finally, we show how to match DR-CPO and load-balancing to dispatch optimally information streams to available edge servers and further improve system performance.
翻译:随着智能数据驱动应用的普及,边缘计算基础设施需要新一代准入控制算法,以在有限且高度异构的资源下最大化系统性能。本文研究如何最优选择属于不同类别的信息流,并将其分发至多个边缘服务器,其中部署的应用程序执行流分析任务。最优策略通过约束马尔可夫决策过程(CMDP)理论获得,该理论考虑了每个边缘应用对特定类别流量的需求、边缘服务器计算能力的约束以及接入网络的限制。我们开发了DR-CPO,一种专门的原对偶安全强化学习(SRL)方法,通过奖励分解解决由此产生的最优准入控制问题。DR-CPO实现了最优分散控制,有效缓解了状态空间爆炸问题,同时保持了最优性。与现有的深度强化学习(DRL)解决方案相比,大量实验结果表明,DR-CPO在各种环境中实现了15%的更高奖励,而平均仅需50%的学习回合数即可收敛。最后,我们展示了如何结合DR-CPO与负载均衡,将信息流最优分发至可用边缘服务器,从而进一步提升系统性能。