Conditional Diffusion Model with OOD Mitigation as High-Dimensional Offline Resource Allocation Planner in Clustered Ad Hoc Networks

Due to network delays and scalability limitations, clustered ad hoc networks widely adopt Reinforcement Learning (RL) for on-demand resource allocation. Albeit its demonstrated agility, traditional Model-Free RL (MFRL) solutions struggle to tackle the huge action space, which generally explodes exponentially along with the number of resource allocation units, enduring low sampling efficiency and high interaction cost. In contrast to MFRL, Model-Based RL (MBRL) offers an alternative solution to boost sample efficiency and stabilize the training by explicitly leveraging a learned environment model. However, establishing an accurate dynamic model for complex and noisy environments necessitates a careful balance between model accuracy and computational complexity $\&$ stability. To address these issues, we propose a Conditional Diffusion Model Planner (CDMP) for high-dimensional offline resource allocation in clustered ad hoc networks. By leveraging the astonishing generative capability of Diffusion Models (DMs), our approach enables the accurate modeling of high-quality environmental dynamics while leveraging an inverse dynamics model to plan a superior policy. Beyond simply adopting DMs in offline RL, we further incorporate the CDMP algorithm with a theoretically guaranteed, uncertainty-aware penalty metric, which theoretically and empirically manifests itself in mitigating the Out-of-Distribution (OOD)-induced distribution shift issue underlying scarce training data. Extensive experiments also show that our model outperforms MFRL in average reward and Quality of Service (QoS) while demonstrating comparable performance to other MBRL algorithms.

翻译：由于网络延迟与可扩展性限制，集群自组织网络广泛采用强化学习进行按需资源分配。尽管传统无模型强化学习方法展现出灵活性，但其难以应对巨大的动作空间——该空间通常随资源分配单元数量呈指数级增长，导致采样效率低下且交互成本高昂。与无模型强化学习相比，基于模型的强化学习通过显式利用学习的环境模型，为提高样本效率与稳定训练提供了替代方案。然而，为复杂嘈杂环境建立精确动态模型需要在模型精度与计算复杂度及稳定性之间取得谨慎平衡。为解决这些问题，本文提出一种用于集群自组织网络高维离线资源分配的条件扩散模型规划器。通过利用扩散模型卓越的生成能力，我们的方法能够精确建模高质量环境动态，同时借助逆动力学模型规划更优策略。除在离线强化学习中应用扩散模型外，我们进一步将CDMP算法与具有理论保证的不确定性感知惩罚度量相结合，该度量在理论与实证层面均证明能缓解稀缺训练数据中由分布外问题引发的分布偏移。大量实验表明，我们的模型在平均奖励与服务质量方面优于无模型强化学习，同时与其他基于模型的强化学习算法表现出相当的性能。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日