An unmanned surface vehicle (USV) can perform complex missions by continuously observing the state of its surroundings and taking action toward a goal. A SWARM of USVs working together can complete missions faster, and more effectively than a single USV alone. In this paper, we propose an autonomous communication model for a swarm of USVs. The goal of this system is to implement a software system using Robot Operating System (ROS) and Gazebo. With the main objective of coordinated task completion, the Markov decision process (MDP) provides a base to formulate a task decision problem to achieve efficient localization and tracking in a highly dynamic water environment. To coordinate multiple USVs performing real-time target tracking, we propose an enhanced multi-agent reinforcement learning approach. Our proposed scheme uses MA-DDPG, or Multi-Agent Deep Deterministic Policy Gradient, an extension of the Deep Deterministic Policy Gradients (DDPG) algorithm that allows for decentralized control of multiple agents in a cooperative environment. MA-DDPG's decentralised control allows each and every agent to make decisions based on its own observations and objectives, which can lead to superior gross performance and improved stability. Additionally, it provides communication and coordination among agents through the use of collective readings and rewards.
翻译:无人水面艇(USV)通过持续观测周围环境状态并采取行动达成目标,可执行复杂任务。相较于单艘USV,USV集群协同工作能够以更高效率更快完成任务。本文提出了一种面向USV集群的自主通信模型,其目标在于构建基于机器人操作系统(ROS)和Gazebo仿真平台的软件系统。以协调完成任务为核心目标,马尔可夫决策过程(MDP)为在高动态水环境中实现高效定位与跟踪提供了任务决策问题的建模基础。针对多USV实时目标跟踪的协同需求,我们提出了一种增强型多智能体强化学习方法。所提方案采用MA-DDPG(多智能体深度确定性策略梯度算法),该算法是深度确定性策略梯度(DDPG)算法的扩展,支持合作环境下多智能体的分布式控制。MA-DDPG的分布式控制机制使每个智能体能够基于自身观测和目标独立决策,从而提升整体性能与稳定性。此外,该方法通过共享观测值与奖励机制实现智能体间的通信与协同。