Deep reinforcement learning (DRL) has achieved remarkable progress in online path planning tasks for multi-UAV systems. However, existing DRL-based methods often suffer from performance degradation when tackling unseen scenarios, since the non-causal factors in visual representations adversely affect policy learning. To address this issue, we propose a novel representation learning approach, \ie, causal representation disentanglement, which can identify the causal and non-causal factors in representations. After that, we only pass causal factors for subsequent policy learning and thus explicitly eliminate the influence of non-causal factors, which effectively improves the generalization ability of DRL models. Experimental results show that our proposed method can achieve robust navigation performance and effective collision avoidance especially in unseen scenarios, which significantly outperforms existing SOTA algorithms.
翻译:深度强化学习在多无人机系统的在线路径规划任务中取得了显著进展。然而,现有基于深度强化学习的方法在处理未见场景时常常面临性能下降的问题,这是因为视觉表征中的非因果因素对策略学习产生了不利影响。为解决这一问题,我们提出了一种新颖的表征学习方法,即因果表征解耦,该方法能够识别表征中的因果因素与非因果因素。之后,我们仅将因果因素传递给后续的策略学习过程,从而显式地消除了非因果因素的影响,这有效提升了深度强化学习模型的泛化能力。实验结果表明,我们提出的方法能够实现鲁棒的导航性能和有效的避碰,尤其是在未见场景中,其表现显著优于现有的最先进算法。