Utilizing unmanned aerial vehicles (UAVs) with edge server to assist terrestrial mobile edge computing (MEC) has attracted tremendous attention. Nevertheless, state-of-the-art schemes based on deterministic optimizations or single-objective reinforcement learning (RL) cannot reduce the backlog of task bits and simultaneously improve energy efficiency in highly dynamic network environments, where the design problem amounts to a sequential decision-making problem. In order to address the aforementioned problems, as well as the curses of dimensionality introduced by the growing number of terrestrial terrestrial users, this paper proposes a distributed multi-objective (MO) dynamic trajectory planning and offloading scheduling scheme, integrated with MORL and the kernel method. The design of n-step return is also applied to average fluctuations in the backlog. Numerical results reveal that the n-step return can benefit the proposed kernel-based approach, achieving significant improvement in the long-term average backlog performance, compared to the conventional 1-step return design. Due to such design and the kernel-based neural network, to which decision-making features can be continuously added, the kernel-based approach can outperform the approach based on fully-connected deep neural network, yielding improvement in energy consumption and the backlog performance, as well as a significant reduction in decision-making and online learning time.
翻译:利用搭载边缘服务器的无人机辅助地面移动边缘计算已引起广泛关注。然而,基于确定性优化或单目标强化学习的最先进方案,在高度动态的网络环境中无法同时降低任务比特积压并提升能效,该设计问题本质上是序列决策问题。为解决上述问题以及日益增加的地面用户带来的维度灾难,本文提出一种融合多目标强化学习与核方法的分布式多目标动态轨迹规划与卸载调度方案,同时采用n步回报设计来平滑积压波动。数值结果表明,与传统单步回报设计相比,n步回报能提升所提核方法的长期平均积压性能。得益于该设计及可不断添加决策特征的核神经网络,所提方法在全连接深度神经网络方法基础上,实现了能耗与积压性能的提升,同时显著降低了决策与在线学习时间。