Virtual network embedding (VNE) is an essential resource allocation task in network virtualization, aiming to map virtual network requests (VNRs) onto physical infrastructure. Reinforcement learning (RL) has recently emerged as a promising solution to this problem. However, existing RL-based VNE methods are limited by the unidirectional action design and one-size-fits-all training strategy, resulting in restricted searchability and generalizability. In this paper, we propose a FLexible And Generalizable RL framework for VNE, named FlagVNE. Specifically, we design a bidirectional action-based Markov decision process model that enables the joint selection of virtual and physical nodes, thus improving the exploration flexibility of solution space. To tackle the expansive and dynamic action space, we design a hierarchical decoder to generate adaptive action probability distributions and ensure high training efficiency. Furthermore, to overcome the generalization issue for varying VNR sizes, we propose a meta-RL-based training method with a curriculum scheduling strategy, facilitating specialized policy training for each VNR size. Finally, extensive experimental results show the effectiveness of FlagVNE across multiple key metrics. Our code is available at GitHub (https://github.com/GeminiLight/flag-vne).
翻译:虚拟网络嵌入(VNE)是网络虚拟化中一项关键的资源分配任务,旨在将虚拟网络请求(VNR)映射到物理基础设施上。强化学习(RL)近年来已成为解决该问题的有前景方案。然而,现有基于RL的VNE方法受限于单向动作设计与统一训练策略,导致搜索能力与泛化性受限。本文提出一种灵活且可泛化的VNE强化学习框架——FlagVNE。具体而言,我们设计了一种基于双向动作的马尔可夫决策过程模型,支持联合选择虚拟节点与物理节点,从而提升解空间的探索灵活性。为应对庞大且动态的动作空间,我们设计了分层解码器以生成自适应动作概率分布并确保高训练效率。此外,为解决不同VNR规模的泛化问题,我们提出基于元强化学习的训练方法,结合课程调度策略,为各VNR规模定制专业化策略训练。最终,大量实验结果表明FlagVNE在多项关键指标上具有有效性。我们的代码已在GitHub上开源(https://github.com/GeminiLight/flag-vne)。