Optimizing Reinforcement Learning Training over Digital Twin Enabled Multi-fidelity Networks

In this paper, we investigate a novel digital network twin (DNT) assisted deep learning (DL) model training framework. In particular, we consider a physical network where a base station (BS) uses several antennas to serve multiple mobile users, and a DNT that is a virtual representation of the physical network. The BS must adjust its antenna tilt angles to optimize the data rates of all users. Due to user mobility, the BS may not be able to accurately track network dynamics such as wireless channels and user mobilities. Hence, a reinforcement learning (RL) approach is used to dynamically adjust the antenna tilt angles. To train the RL, we can use data collected from the physical network and the DNT. The data collected from the physical network is more accurate but incurs more communication overhead compared to the data collected from the DNT. Therefore, it is necessary to determine the ratio of data collected from the physical network and the DNT to improve the training of the RL model. We formulate this problem as an optimization problem whose goal is to jointly optimize the tilt angle adjustment policy and the data collection strategy, aiming to maximize the data rates of all users while constraining the time delay introduced by collecting data from the physical network. To solve this problem, we propose a hierarchical RL framework that integrates robust adversarial loss and proximal policy optimization (PPO). Simulation results show that our proposed method reduces the physical network data collection delay by up to 28.01% and 1x compared to a hierarchical RL that uses vanilla PPO as the first level RL, and the baseline that uses robust-RL at the first level and selects the data collection ratio randomly.

翻译：本文研究一种新颖的数字网络孪生辅助深度学习模型训练框架。具体而言，我们考虑一个物理网络，其中基站通过多根天线为多个移动用户提供服务，同时存在一个作为物理网络虚拟表征的数字网络孪生体。基站需调整其天线倾角以优化所有用户的数据速率。由于用户移动性，基站可能无法准确跟踪无线信道和用户移动性等网络动态。因此，采用强化学习方法动态调整天线倾角。为训练强化学习模型，可使用从物理网络和数字网络孪生体收集的数据。从物理网络收集的数据精度更高，但与数字网络孪生体数据相比会产生更多通信开销。因此，需要确定从物理网络和数字网络孪生体收集数据的比例以改进强化学习模型的训练。我们将该问题建模为一个优化问题，其目标是联合优化倾角调整策略与数据收集策略，在约束从物理网络收集数据所引入时延的前提下最大化所有用户的数据速率。为解决此问题，我们提出一种集成鲁棒对抗损失与近端策略优化的分层强化学习框架。仿真结果表明：相较于第一层使用标准PPO的分层强化学习方法，以及第一层使用鲁棒强化学习并随机选择数据收集比例的基线方法，我们提出的方法将物理网络数据收集时延分别降低了最高28.01%和1倍。