This paper proposes a spatial-temporal recurrent neural network architecture for deep $Q$-networks that can be used to steer an autonomous ship. The network design makes it possible to handle an arbitrary number of surrounding target ships while offering robustness to partial observability. Furthermore, a state-of-the-art collision risk metric is proposed to enable an easier assessment of different situations by the agent. The COLREG rules of maritime traffic are explicitly considered in the design of the reward function. The final policy is validated on a custom set of newly created single-ship encounters called `Around the Clock' problems and the commonly used Imazu (1987) problems, which include 18 multi-ship scenarios. Performance comparisons with artificial potential field and velocity obstacle methods demonstrate the potential of the proposed approach for maritime path planning. Furthermore, the new architecture exhibits robustness when it is deployed in multi-agent scenarios and it is compatible with other deep reinforcement learning algorithms, including actor-critic frameworks.
翻译:本文提出一种用于深度$Q$-网络的空间-时间递归神经网络架构,该架构可应用于自主船舶的操纵控制。该网络设计能够处理可变数量的周围目标船舶,同时具备对部分可观测性的鲁棒性。此外,本文提出一种前沿的碰撞风险度量方法,使智能体能够更便捷地评估不同航行态势。在奖励函数设计中明确考虑了《国际海上避碰规则》(COLREGS)的相关规则。最终策略在一组新构建的名为"周天环绕"单船会遇场景以及普遍采用的伊豆(1987)多船会遇场景(包含18种多船局面)上得到验证。与人工势场法和速度障碍法的性能对比表明,所提方法在海面路径规划方面具有潜力。进一步地,该新型架构在多智能体场景部署时展现出鲁棒性,并与包括Actor-Critic框架在内的其他深度强化学习算法具有兼容性。