The integration of generative artificial intelligence with wireless communication and signal processing systems has opened new avenues for intelligent, data-driven decision-making in future 6G networks. This work proposes a diffusion soft actor-critic (Diffusion-SAC) approach that leverages offline reinforcement learning (RL) enhanced by denoising diffusion probabilistic models (DDPMs) to optimize trajectory and scheduling control in unmanned aerial vehicle (UAV) networks. While offline RL methods, such as conservative Q-learning (CQL), can learn from static datasets, they often struggle to generalize in low-data or dynamic conditions. To address this, we combine the robustness of CQL with the generative power of diffusion models, enabling expressive and signal-aware policy learning that generalizes beyond behavior policies. Applied to a UAV-assisted wireless network, the proposed framework minimizes transmission energy and improves fairness among devices. Simulations show that Diffusion-SAC outperforms standard offline RL baselines, achieving more stable convergence and higher rewards even with limited datasets. The method enhances data efficiency, reduces energy consumption, and increases throughput by more than 35 % compared to existing algorithms, demonstrating its potential for robust policy learning in next-generation wireless control systems.
翻译:生成式人工智能与无线通信及信号处理系统的融合,为未来6G网络中智能、数据驱动的决策开辟了新途径。本文提出一种扩散软演员-评论家(Diffusion-SAC)方法,利用去噪扩散概率模型(DDPMs)增强的离线强化学习(RL)来优化无人机(UAV)网络中的轨迹与调度控制。虽然离线RL方法(如保守Q学习(CQL))能够从静态数据集中学习,但它们在低数据或动态条件下常难以泛化。为解决这一问题,我们将CQL的鲁棒性与扩散模型的生成能力相结合,实现了超越行为策略的、富有表现力且感知信号的策略学习。将该框架应用于无人机辅助无线网络,可最小化传输能量并提升设备间的公平性。仿真表明,Diffusion-SAC优于标准离线RL基线,即使数据集有限也能实现更稳定的收敛和更高的奖励。与现有算法相比,该方法提升了数据效率,降低了能耗,并将吞吐量提高了35%以上,展示了其在下一代无线控制系统中的鲁棒策略学习潜力。