Learning-based adaptive control methods hold the premise of enabling autonomous agents to reduce the effect of process variations with minimal human intervention. However, its application to autonomous underwater vehicles (AUVs) has so far been restricted due to 1) unknown dynamics under the form of sea current disturbance that we can not model properly nor measure due to limited sensor capability and 2) the nonlinearity of AUVs tasks where the controller response at some operating points must be overly conservative in order to satisfy the specification at other operating points. Deep Reinforcement Learning (DRL) can alleviates these limitations by training general-purpose neural network policies, but applications of DRL algorithms to AUVs have been restricted to simulated environments, due to their inherent high sample complexity and distribution shift problem. This paper presents a novel approach, merging the Maximum Entropy Deep Reinforcement Learning framework with a classic model-based control architecture, to formulate an adaptive controller. Within this framework, we introduce a Sim-to-Real transfer strategy comprising the following components: a bio-inspired experience replay mechanism, an enhanced domain randomisation technique, and an evaluation protocol executed on a physical platform. Our experimental assessments demonstrate that this method effectively learns proficient policies from suboptimal simulated models of the AUV, resulting in control performance 3 times higher when transferred to a real-world vehicle, compared to its model-based nonadaptive but optimal counterpart.
翻译:基于学习的自适应控制方法有望使自主智能体在最少人工干预下降低过程变异的影响。然而,该方法在水下自主航行器(AUV)中的应用至今仍受限于两个因素:1)由于传感器能力有限,我们无法准确建模或测量海流干扰等未知动力学特性;2)AUV任务的非线性特性导致在某些工作点上的控制器响应必须过度保守,以满足其他工作点上的性能指标。深度强化学习(DRL)可通过训练通用神经网络策略缓解这些限制,但由于其固有的高样本复杂度和分布偏移问题,DRL算法在AUV上的应用始终局限于仿真环境。本文提出一种融合最大熵深度强化学习框架与经典模型控制架构的新方法,构建自适应控制器。在该框架内,我们引入涵盖以下组件的仿真到真实迁移策略:仿生经验回放机制、增强型域随机化技术,以及在物理平台上执行的评估协议。实验评估表明,该方法能从AUV的非最优仿真模型中有效学习高效策略,相较基于模型但非自适应的最优控制器,迁移至真实载体后控制性能提升3倍。