A comparison of RL-based and PID controllers for 6-DOF swimming robots: hybrid underwater object tracking

In this paper, we present an exploration and assessment of employing a centralized deep Q-network (DQN) controller as a substitute for the prevalent use of PID controllers in the context of 6DOF swimming robots. Our primary focus centers on illustrating this transition with the specific case of underwater object tracking. DQN offers advantages such as data efficiency and off-policy learning, while remaining simpler to implement than other reinforcement learning methods. Given the absence of a dynamic model for our robot, we propose an RL agent to control this multi-input-multi-output (MIMO) system, where a centralized controller may offer more robust control than distinct PIDs. Our approach involves initially using classical controllers for safe exploration, then gradually shifting to DQN to take full control of the robot. We divide the underwater tracking task into vision and control modules. We use established methods for vision-based tracking and introduce a centralized DQN controller. By transmitting bounding box data from the vision module to the control module, we enable adaptation to various objects and effortless vision system replacement. Furthermore, dealing with low-dimensional data facilitates cost-effective online learning for the controller. Our experiments, conducted within a Unity-based simulator, validate the effectiveness of a centralized RL agent over separated PID controllers, showcasing the applicability of our framework for training the underwater RL agent and improved performance compared to traditional control methods. The code for both real and simulation implementations is at https://github.com/FARAZLOTFI/underwater-object-tracking.

翻译：本文探索并评估了在六自由度游泳机器人控制中，采用集中式深度Q网络（DQN）控制器替代广泛使用的PID控制器的可行性。我们重点以水下物体跟踪任务为例阐述这一转变。DQN具有数据高效性和离策略学习的优势，且相比其他强化学习方法更易于实现。鉴于机器人缺乏动力学模型，我们提出使用强化学习智能体控制这个多输入多输出（MIMO）系统，其中集中式控制器可能比独立的PID控制器提供更鲁棒的控制。我们的方法首先使用经典控制器进行安全探索，然后逐步过渡到DQN以完全控制机器人。我们将水下跟踪任务分为视觉模块和控制模块：采用成熟的视觉跟踪方法，同时引入集中式DQN控制器。通过将视觉模块生成的边界框数据传递给控制模块，实现了对不同物体的自适应能力以及视觉系统的便捷替换。此外，低维数据处理有利于控制器的低成本在线学习。在基于Unity的仿真器实验中，我们验证了集中式强化学习智能体相较于独立PID控制器的有效性，展示了该框架在水下强化学习智能体训练中的适用性，以及相较于传统控制方法的性能提升。实际机器人与仿真实现的代码均可在https://github.com/FARAZLOTFI/underwater-object-tracking获取。