The next-generation wireless technologies, including beyond 5G and 6G networks, are paving the way for transformative applications such as vehicle platooning, smart cities, and remote surgery. These innovations are driven by a vast array of interconnected wireless entities, including IoT devices, access points, UAVs, and CAVs, which increase network complexity and demand more advanced decision-making algorithms. Artificial intelligence (AI) and machine learning (ML), especially reinforcement learning (RL), are key enablers for such networks, providing solutions to high-dimensional and complex challenges. However, as networks expand to multi-agent environments, traditional online RL approaches face cost, safety, and scalability limitations. Offline multi-agent reinforcement learning (MARL) offers a promising solution by utilizing pre-collected data, reducing the need for real-time interaction. This article introduces a novel offline MARL algorithm based on conservative Q-learning (CQL), ensuring safe and efficient training. We extend this with meta-learning to address dynamic environments and validate the approach through use cases in radio resource management and UAV networks. Our work highlights offline MARL's advantages, limitations, and future directions in wireless applications.
翻译:下一代无线技术,包括超5G和6G网络,正在为车辆编队、智慧城市和远程手术等变革性应用铺平道路。这些创新由大量互联的无线实体驱动,包括物联网设备、接入点、无人机和网联自动驾驶车辆,这增加了网络复杂性并需要更先进的决策算法。人工智能与机器学习,特别是强化学习,是此类网络的关键赋能技术,为高维复杂挑战提供了解决方案。然而,随着网络扩展到多智能体环境,传统在线强化学习方法面临成本、安全性和可扩展性限制。离线多智能体强化学习通过利用预先收集的数据,减少对实时交互的需求,提供了一种有前景的解决方案。本文介绍了一种基于保守Q学习的新型离线多智能体强化学习算法,确保安全高效的训练。我们通过元学习扩展该方法以应对动态环境,并通过无线电资源管理和无人机网络中的用例验证了该方法的有效性。我们的工作重点阐述了离线多智能体强化学习在无线应用中的优势、局限性与未来研究方向。