DWM-RO: Decentralized World Models with Reasoning Offloading for SWIPT-enabled Satellite-Terrestrial HetNets

Wireless networks are undergoing a paradigm shift toward massive connectivity with energy-efficient operation, driving the integration of satellite-terrestrial architectures with simultaneous wireless information and power transfer (SWIPT). Optimizing transmit beamforming and power splitting in such systems faces formidable challenges, e.g., time-varying channels and multi-tier interference, which create a complex decision landscape where conventional model-free multi-agent reinforcement learning (MARL) suffers from sample inefficiency due to rarely-encountered state transitions and poor coordination as decentralized agents act independently. This paper proposes the Decentralized World Model with Reasoning Offloading (DWM-RO) framework to address these fundamental limitations. Specifically, each agent employs a world model to learn compact predictive representations of environment dynamics, enabling imagination-based policy training that dramatically reduces required environment interactions. An uncertainty-aware offloading gate monitors local interference levels and model reconstruction errors to trigger selective edge coordination. When activated, a lightweight latent decorrelation mechanism at the edge refines agents' strategic representations, guiding them toward orthogonal actions that minimize resource conflicts. Extensive simulations demonstrate that DWM-RO converges 5 times faster than state-of-the-art baselines while achieving 34.7% higher spectral efficiency and reducing constraint violations by 40%. In dense network scenarios with 10 users, DWM-RO maintains violation rates below 20% while baselines exceed 70%, validating superior robustness.

翻译：无线网络正经历向大规模连接与高能效运行的范式转变，推动了卫星-地面架构与同步无线信息与能量传输（SWIPT）技术的融合。在此类系统中优化发射波束赋形与功率分配面临严峻挑战——例如时变信道与多层干扰形成的复杂决策空间，使得传统无模型多智能体强化学习（MARL）因难以遭遇罕见状态转移而易陷入样本低效，且各去中心化智能体独立决策导致协调性差。本文提出带推理卸载的去中心化世界模型（DWM-RO）框架以应对上述根本性局限。具体而言，每个智能体采用世界模型学习环境动态的紧凑预测表征，从而支持基于想象力的策略训练，大幅降低所需环境交互次数。一种不确定性感知的卸载门控机制通过监测本地干扰水平与模型重构误差，触发选择性边缘协同。激活后，边缘节点的轻量级潜在去相关机制将精炼智能体的策略表征，引导其采用正交化动作以最小化资源冲突。大量仿真表明，DWM-RO的收敛速度较现有最优基线快5倍，同时频谱效率提升34.7%，约束违反率降低40%。在包含10个用户的密集网络场景中，DWM-RO将违反率维持在20%以下，而基线方法超过70%，验证了其卓越的鲁棒性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

OpenEarthAgent：一种面向工具增强型地理空间智能体的统一框架

专知会员服务

16+阅读 · 2月20日

面向军用卫星通信的更具韧性方案

专知会员服务

17+阅读 · 1月30日

《空战战术中多智能体强化学习战略决策的可解释性研究》最新报告

专知会员服务

41+阅读 · 2025年9月12日

《无人机辅助的天空地一体化网络：学习算法技术综述》

专知会员服务

51+阅读 · 2025年3月11日