Deep reinforcement learning has achieved significant results in low-level controlling tasks. However, for some applications like autonomous driving and drone flying, it is difficult to control behavior stably since the agent may suddenly change its actions which often lowers the controlling system's efficiency, induces excessive mechanical wear, and causes uncontrollable, dangerous behavior to the vehicle. Recently, a method called conditioning for action policy smoothness (CAPS) was proposed to solve the problem of jerkiness in low-dimensional features for applications such as quadrotor drones. To cope with high-dimensional features, this paper proposes image-based regularization for action smoothness (I-RAS) for solving jerky control in autonomous miniature car racing. We also introduce a control based on impact ratio, an adaptive regularization weight to control the smoothness constraint, called IR control. In the experiment, an agent with I-RAS and IR control significantly improves the success rate from 59% to 95%. In the real-world-track experiment, the agent also outperforms other methods, namely reducing the average finish lap time, while also improving the completion rate even without real world training. This is also justified by an agent based on I-RAS winning the 2022 AWS DeepRacer Final Championship Cup.
翻译:深度强化学习在低级控制任务中取得了显著成果。然而,在自动驾驶和无人机飞行等应用中,由于智能体可能突然改变动作,导致控制系统效率降低、机械磨损加剧,并引发车辆失控等危险行为,因此难以实现稳定的行为控制。近期,针对四旋翼无人机等低维特征应用中的动作剧烈问题,研究人员提出了动作策略平滑条件化(CAPS)方法。为应对高维特征,本文提出基于图像的动作平滑正则化(I-RAS)方法,用于解决自主微型赛车竞速中的剧烈控制问题。此外,我们还引入了一种基于冲击比的自适应正则化权重控制方法(称为IR控制),用于约束动作平滑度。实验中,采用I-RAS和IR控制的智能体将成功率从59%显著提升至95%。在真实赛道实验中,即使未经真实环境训练,该智能体在缩短平均单圈完成时间的同时,也提高了任务完成率,性能优于其他方法。基于I-RAS的智能体在2022年AWS DeepRacer总决赛中夺冠,进一步验证了该方法的有效性。