Creating safe paths in unknown and uncertain environments is a challenging aspect of leader-follower formation control. In this architecture, the leader moves toward the target by taking optimal actions, and followers should also avoid obstacles while maintaining their desired formation shape. Most of the studies in this field have inspected formation control and obstacle avoidance separately. The present study proposes a new approach based on deep reinforcement learning (DRL) for end-to-end motion planning and control of under-actuated autonomous underwater vehicles (AUVs). The aim is to design optimal adaptive distributed controllers based on actor-critic structure for AUVs formation motion planning. This is accomplished by controlling the speed and heading of AUVs. In obstacle avoidance, two approaches have been deployed. In the first approach, the goal is to design control policies for the leader and followers such that each learns its own collision-free path. Moreover, the followers adhere to an overall formation maintenance policy. In the second approach, the leader solely learns the control policy, and safely leads the whole group towards the target. Here, the control policy of the followers is to maintain the predetermined distance and angle. In the presence of ocean currents, communication delays, and sensing errors, the robustness of the proposed method under realistically perturbed circumstances is shown. The efficiency of the algorithms has been evaluated and approved using a number of computer-based simulations.
翻译:在未知和不确定环境中创建安全路径是领航-跟随编队控制中的一个挑战性方面。在该架构中,领航者通过采取最优动作向目标移动,而跟随者需在保持期望编队形状的同时避开障碍物。该领域的大多数研究分别考察了编队控制与障碍物规避问题。本研究提出了一种基于深度强化学习的新型端到端运动规划与控制方法,用于欠驱动自主水下航行器。目标在于基于执行器-评判器结构设计最优自适应分布式控制器,实现自主水下航行器编队运动规划,通过控制航行器的速度和航向完成该任务。在障碍物规避方面,采用了两种方法。第一种方法旨在为领航者和跟随者设计控制策略,使每个个体学习各自的避碰路径,同时跟随者需遵循整体编队保持策略。第二种方法中,仅领航者学习控制策略,并安全引导整个编队向目标移动,此时跟随者的控制策略是保持预设距离和角度。在海流、通信延迟和传感误差存在的情况下,验证了所提方法在现实扰动条件下的鲁棒性。通过一系列基于计算机仿真的实验,评估并确认了算法的有效性。