The Metaverse is a new paradigm that aims to create a virtual environment consisting of numerous worlds, each of which will offer a different set of services. To deal with such a dynamic and complex scenario, considering the stringent quality of service requirements aimed at the 6th generation of communication systems (6G), one potential approach is to adopt self-sustaining strategies, which can be realized by employing Adaptive Artificial Intelligence (Adaptive AI) where models are continually re-trained with new data and conditions. One aspect of self-sustainability is the management of multiple access to the frequency spectrum. Although several innovative methods have been proposed to address this challenge, mostly using Deep Reinforcement Learning (DRL), the problem of adapting agents to a non-stationary environment has not yet been precisely addressed. This paper fills in the gap in the current literature by investigating the problem of multiple access in multi-channel environments to maximize the throughput of the intelligent agent when the number of active User Equipments (UEs) may fluctuate over time. To solve the problem, a Double Deep Q-Learning (DDQL) technique empowered by Continual Learning (CL) is proposed to overcome the non-stationary situation, while the environment is unknown. Numerical simulations demonstrate that, compared to other well-known methods, the CL-DDQL algorithm achieves significantly higher throughputs with a considerably shorter convergence time in highly dynamic scenarios.
翻译:元宇宙是一种旨在创建由众多世界组成的虚拟环境的新型范式,每个世界将提供不同的服务集。为应对这种动态且复杂的场景,同时考虑第六代通信系统(6G)设定的严格服务质量要求,一种潜在的途径是采用自持续策略,这可通过部署自适应人工智能(Adaptive AI)实现——其中模型持续使用新数据和条件进行重新训练。自持续性的一个方面是对频谱多址接入的管理。尽管已有多种创新方法(主要利用深度强化学习(DRL))来应对这一挑战,但如何使智能体适应非平稳环境的问题尚未得到精确解决。本文通过研究多信道环境中的多址接入问题填补了现有文献的空白,旨在当活跃用户设备(UEs)数量随时间波动时最大化智能体的吞吐量。为解决该问题,提出了一种由持续学习(CL)增强的双深度Q学习(DDQL)技术,以在环境未知的情况下克服非平稳场景。数值仿真表明,与其他知名方法相比,CL-DDQL算法在高度动态的场景中实现了显著更高的吞吐量和更短的收敛时间。