Digital twin (DT) platforms are increasingly regarded as a promising technology for controlling, optimizing, and monitoring complex engineering systems such as next-generation wireless networks. An important challenge in adopting DT solutions is their reliance on data collected offline, lacking direct access to the physical environment. This limitation is particularly severe in multi-agent systems, for which conventional multi-agent reinforcement (MARL) requires online interactions with the environment. A direct application of online MARL schemes to an offline setting would generally fail due to the epistemic uncertainty entailed by the limited availability of data. In this work, we propose an offline MARL scheme for DT-based wireless networks that integrates distributional RL and conservative Q-learning to address the environment's inherent aleatoric uncertainty and the epistemic uncertainty arising from limited data. To further exploit the offline data, we adapt the proposed scheme to the centralized training decentralized execution framework, allowing joint training of the agents' policies. The proposed MARL scheme, referred to as multi-agent conservative quantile regression (MA-CQR) addresses general risk-sensitive design criteria and is applied to the trajectory planning problem in drone networks, showcasing its advantages.
翻译:数字孪生(DT)平台被日益视为控制、优化和监控复杂工程系统(如下一代无线网络)的一项前景广阔的技术。采用DT解决方案面临的一项重要挑战是其依赖离线收集的数据,缺乏对物理环境的直接访问权限。这一限制在多智能体系统中尤为严峻,因为传统的多智能体强化学习(MARL)需要与环境进行在线交互。由于数据可用性有限所带来的认知不确定性,在线MARL方案直接应用于离线场景通常会失效。为此,本文提出一种基于DT的无线网络离线MARL方案,该方案整合了分布式强化学习与保守Q学习,以应对环境固有的偶然不确定性以及数据有限引发的认知不确定性。为进一步充分利用离线数据,我们将所提方案适配至集中训练分散执行框架,从而支持智能体策略的联合训练。所提出的MARL方案——称为多智能体保守分位数回归(MA-CQR)——能够处理通用的风险敏感设计准则,并将其应用于无人机网络的轨迹规划问题,展示了其优越性。