DHRL-FNMR: An Intelligent Multicast Routing Approach Based on Deep Hierarchical Reinforcement Learning in SDN

The optimal multicast tree problem in the Software-Defined Networking (SDN) multicast routing is an NP-hard combinatorial optimization problem. Although existing SDN intelligent solution methods, which are based on deep reinforcement learning, can dynamically adapt to complex network link state changes, these methods are plagued by problems such as redundant branches, large action space, and slow agent convergence. In this paper, an SDN intelligent multicast routing algorithm based on deep hierarchical reinforcement learning is proposed to circumvent the aforementioned problems. First, the multicast tree construction problem is decomposed into two sub-problems: the fork node selection problem and the construction of the optimal path from the fork node to the destination node. Second, based on the information characteristics of SDN global network perception, the multicast tree state matrix, link bandwidth matrix, link delay matrix, link packet loss rate matrix, and sub-goal matrix are designed as the state space of intrinsic and meta controllers. Then, in order to mitigate the excessive action space, our approach constructs different action spaces at the upper and lower levels. The meta-controller generates an action space using network nodes to select the fork node, and the intrinsic controller uses the adjacent edges of the current node as its action space, thus implementing four different action selection strategies in the construction of the multicast tree. To facilitate the intelligent agent in constructing the optimal multicast tree with greater speed, we developed alternative reward strategies that distinguish between single-step node actions and multi-step actions towards multiple destination nodes.

翻译：软件定义网络（SDN）组播路由中的最优组播树问题是一个NP难组合优化问题。现有基于深度强化学习的SDN智能求解方法虽能动态适应复杂网络链路状态变化，但仍存在冗余分支、动作空间过大及智能体收敛缓慢等问题。为规避上述问题，本文提出一种基于深度分层强化学习的SDN智能组播路由算法。首先，将组播树构建问题分解为分支节点选择问题和从分支节点到目的节点的最优路径构建两个子问题。其次，根据SDN全局网络感知的信息特征，设计组播树状态矩阵、链路带宽矩阵、链路时延矩阵、链路丢包率矩阵及子目标矩阵作为内层控制器与元控制器的状态空间。然后，为缓解动作空间过大问题，本方法在上下层构建差异化动作空间：元控制器以网络节点生成动作空间用于选择分支节点，内层控制器则以当前节点的邻接边作为动作空间，从而在组播树构建中实现四种不同的动作选择策略。为促使智能体更快速地构建最优组播树，我们开发了区分单步节点动作与面向多目的节点的多步动作的交替奖励策略。