We explore sim-to-real transfer of deep reinforcement learning controllers for a heavy vehicle with active suspensions designed for traversing rough terrain. While related research primarily focuses on lightweight robots with electric motors and fast actuation, this study uses a forestry vehicle with a complex hydraulic driveline and slow actuation. We simulate the vehicle using multibody dynamics and apply system identification to find an appropriate set of simulation parameters. We then train policies in simulation using various techniques to mitigate the sim-to-real gap, including domain randomization, action delays, and a reward penalty to encourage smooth control. In reality, the policies trained with action delays and a penalty for erratic actions perform at nearly the same level as in simulation. In experiments on level ground, the motion trajectories closely overlap when turning to either side, as well as in a route tracking scenario. When faced with a ramp that requires active use of the suspensions, the simulated and real motions are in close alignment. This shows that the actuator model together with system identification yields a sufficiently accurate model of the actuators. We observe that policies trained without the additional action penalty exhibit fast switching or bang-bang control. These present smooth motions and high performance in simulation but transfer poorly to reality. We find that policies make marginal use of the local height map for perception, showing no indications of look-ahead planning. However, the strong transfer capabilities entail that further development concerning perception and performance can be largely confined to simulation.
翻译:我们研究了针对配备主动悬架的越野重型车辆,如何实现深度强化学习控制器从仿真环境到现实场景的迁移。尽管相关研究主要集中于采用电机与快速驱动的轻量级机器人,但本研究以配备复杂液压传动系统与慢速驱动的林业车辆为对象。我们利用多体动力学对车辆进行仿真,并应用系统辨识方法确定合适的仿真参数集。随后在仿真环境中采用多种技术训练策略以弥合仿真与现实的差距,包括域随机化、动作延迟以及用于鼓励平滑控制的奖励惩罚机制。在实际应用中,采用动作延迟与动作波动惩罚机制训练的策略,其表现几乎与仿真环境持平。在平坦地面的实验中,车辆转向两侧时的运动轨迹高度重合,且在路径跟踪场景中亦呈现一致。面对需要主动悬架参与的斜坡场景时,仿真与现实的运动状态高度吻合。这表明执行器模型配合系统辨识方法能够精确刻画执行器的动力学特性。我们观察到,未添加额外动作惩罚项训练的策略会出现快速切换或"bang-bang控制"现象,这些策略在仿真中呈现平滑运动与优异性能,但在现实场景中迁移效果较差。研究发现,策略对局部高程图的感知利用极为有限,未表现出前瞻性规划的迹象。然而,强大的迁移能力意味着关于感知与性能的进一步开发工作可以主要局限于仿真环境。