Virtual network embedding (VNE) is an essential resource allocation task in network virtualization, aiming to map virtual network requests (VNRs) onto physical infrastructure. Reinforcement learning (RL) has recently emerged as a promising solution to this problem. However, existing RL-based VNE methods are limited by the unidirectional action design and one-size-fits-all training strategy, resulting in restricted searchability and generalizability. In this paper, we propose a FLexible And Generalizable RL framework for VNE, named FlagVNE. Specifically, we design a bidirectional action-based Markov decision process model that enables the joint selection of virtual and physical nodes, thus improving the exploration flexibility of solution space. To tackle the expansive and dynamic action space, we design a hierarchical decoder to generate adaptive action probability distributions and ensure high training efficiency. Furthermore, to overcome the generalization issue for varying VNR sizes, we propose a meta-RL-based training method with a curriculum scheduling strategy, facilitating specialized policy training for each VNR size. Finally, extensive experimental results show the effectiveness of FlagVNE across multiple key metrics. Our code is available at GitHub (https://github.com/GeminiLight/flag-vne).
翻译:虚拟网络嵌入(VNE)是网络虚拟化中的关键资源分配任务,旨在将虚拟网络请求(VNR)映射至物理基础设施。强化学习(RL)近期已成为解决该问题的一种有前景的方案。然而,现有基于RL的VNE方法受限于单向动作设计与"一刀切"训练策略,导致搜索能力与泛化性受限。本文提出一种灵活且可泛化的VNE强化学习框架FlagVNE。具体而言,我们设计了基于双向动作的马尔可夫决策过程模型,支持联合选择虚拟节点与物理节点,从而提升解空间的探索灵活性。为应对庞大且动态的动作空间,我们设计了一种层次化解码器以生成自适应动作概率分布,并确保高训练效率。此外,针对不同VNR规模的泛化问题,我们提出基于元强化学习的训练方法,结合课程调度策略,为每种VNR规模适配专门的策略训练。最终,广泛实验结果表明FlagVNE在多项关键指标上的有效性。我们的代码已开源至GitHub(https://github.com/GeminiLight/flag-vne)。