Ad hoc wireless networks exhibit complex, innate and coupled dynamics: node mobility, energy depletion and topology change that are difficult to model analytically. Model-free deep reinforcement learning requires sustained online interaction whereas existing model based approaches use flat state representations that lose per node structure. Therefore we propose G-RSSM, a graph structured recurrent state space model that maintains per node latent states with cross node multi head attention to learn the dynamics jointly from offline trajectories. We apply the proposed method to the downstream task clustering where a cluster head selection policy trains entirely through imagined rollouts in the learned world model. Across 27 evaluation scenarios spanning MANET, VANET, FANET, WSN and tactical networks with N=30 to 1000 nodes, the learned policy maintains high connectivity with only trained for N=50. Herein, we propose the first multi physics graph structured world model applied to combinatorial per node decision making in size agnostic wireless ad hoc networks.
翻译:Ad Hoc无线网络表现出复杂、固有且耦合的动力学特性:节点移动性、能量消耗和拓扑变化难以通过解析模型进行建模。无模型深度强化学习需要持续的在线交互,而现有基于模型的方法采用扁平化的状态表示,丢失了每个节点的结构信息。为此,我们提出了G-RSSM,一种图结构循环状态空间模型,该模型通过维护每个节点的潜在状态,并结合跨节点多头注意力机制,从离线轨迹中联合学习动力学特性。我们将所提出的方法应用于下游任务——聚类,其中簇头选择策略完全通过在所学的世界模型中进行想象推演来训练。在涵盖MANET、VANET、FANET、WSN和战术网络的27个评估场景中(节点数量从30到1000),所学的策略仅针对N=50进行训练,即可保持较高的连通性。本文首次提出了应用于规模无关的无线Ad Hoc网络中组合式逐节点决策的多物理量图结构世界模型。