Despite the popularity of reinforcement learning (RL) in wireless networks, existing approaches that rely on model-free RL (MFRL) and model-based RL (MBRL) are data inefficient and short-sighted. Such RL-based solutions cannot generalize to novel network states since they capture only statistical patterns rather than the underlying physics and logic from wireless data. These limitations become particularly challenging in complex wireless networks with high dynamics and long-term planning requirements. To address these limitations, in this paper, a novel dual-mind world model-based learning framework is proposed with the goal of optimizing completeness-weighted age of information (CAoI) in a challenging mmWave V2X scenario. Inspired by cognitive psychology, the proposed dual-mind world model encompasses a pattern-driven System 1 component and a logic-driven System 2 component to learn dynamics and logic of the wireless network, and to provide long-term link scheduling over reliable imagined trajectories. Link scheduling is learned through end-to-end differentiable imagined trajectories with logical consistency over an extended horizon rather than relying on wireless data obtained from environment interactions. Moreover, through imagination rollouts, the proposed world model can jointly reason network states and plan link scheduling. During intervals without observations, the proposed method remains capable of making efficient decisions. Extensive experiments are conducted on a realistic simulator based on Sionna with real-world physical channel, ray-tracing, and scene objects with material properties. Simulation results show that the proposed world model achieves a significant improvement in data efficiency and achieves strong generalization and adaptation to unseen environments, compared to the state-of-the-art RL baselines, and the world model approach with only System 1.
翻译:尽管强化学习(RL)在无线网络中应用广泛,但现有基于无模型强化学习(MFRL)和基于模型强化学习(MBRL)的方法存在数据效率低和目光短浅的问题。此类基于RL的解决方案无法泛化至新的网络状态,因为它们仅捕捉统计模式而非从无线数据中提取底层物理规律与逻辑。这些局限在具有高度动态性和长期规划需求的复杂无线网络中尤为突出。为应对这些挑战,本文提出一种新颖的双心智世界模型学习框架,旨在具有挑战性的毫米波车联网(V2X)场景中优化完整性加权信息年龄(CAoI)。受认知心理学启发,所提出的双心智世界模型包含模式驱动的系统1组件和逻辑驱动的系统2组件,以学习无线网络的动态特性与逻辑规则,并在可靠想象的轨迹上提供长期链路调度。链路调度通过端到端可微分的想象轨迹进行学习,这些轨迹在扩展时间范围内保持逻辑一致性,而非依赖从环境交互中获取的无线数据。此外,通过想象推演,所提出的世界模型能够联合推理网络状态并规划链路调度。在无观测数据的时间间隔内,该方法仍能做出高效决策。基于Sionna的真实仿真器进行了大量实验,该仿真器包含真实物理信道、射线追踪及具有材料属性的场景物体。仿真结果表明,与最先进的RL基线方法及仅含系统1的世界模型方法相比,所提出的世界模型在数据效率上取得显著提升,并对未见环境展现出强大的泛化与适应能力。