Deep reinforcement learning (DRL) has achieved remarkable progress in online path planning tasks for multi-UAV systems. However, existing DRL-based methods often suffer from performance degradation when tackling unseen scenarios, since the non-causal factors in visual representations adversely affect policy learning. To address this issue, we propose a novel representation learning approach, \ie, causal representation disentanglement, which can identify the causal and non-causal factors in representations. After that, we only pass causal factors for subsequent policy learning and thus explicitly eliminate the influence of non-causal factors, which effectively improves the generalization ability of DRL models. Experimental results show that our proposed method can achieve robust navigation performance and effective collision avoidance especially in unseen scenarios, which significantly outperforms existing SOTA algorithms.
翻译:深度强化学习在多无人机系统的在线路径规划任务中取得了显著进展。然而,现有基于深度强化学习的方法在处理未见场景时,常因视觉表征中的非因果因素对策略学习产生不利影响而导致性能下降。为解决此问题,我们提出一种新颖的表征学习方法,即因果表征解耦,该方法能够识别表征中的因果与非因果因素。随后,我们仅将因果因素传递至后续策略学习过程,从而显式消除非因果因素的影响,有效提升了深度强化学习模型的泛化能力。实验结果表明,所提方法能够实现鲁棒的导航性能与有效的碰撞规避,尤其在未见场景中表现突出,显著优于现有的先进算法。