In this paper, we propose a novel technique, Back-stepping Experience Replay (BER), that is compatible with arbitrary off-policy reinforcement learning (RL) algorithms. BER aims to enhance learning efficiency in systems with approximate reversibility, reducing the need for complex reward shaping. The method constructs reversed trajectories using back-stepping transitions to reach random or fixed targets. Interpretable as a bi-directional approach, BER addresses inaccuracies in back-stepping transitions through a distillation of the replay experience during learning. Given the intricate nature of soft robots and their complex interactions with environments, we present an application of BER in a model-free RL approach for the locomotion and navigation of a soft snake robot, which is capable of serpentine motion enabled by anisotropic friction between the body and ground. In addition, a dynamic simulator is developed to assess the effectiveness and efficiency of the BER algorithm, in which the robot demonstrates successful learning (reaching a 100% success rate) and adeptly reaches random targets, achieving an average speed 48% faster than that of the best baseline approach.
翻译:本文提出一种名为反向步进经验回放(BER)的新技术,该技术可兼容任意离策略强化学习(RL)算法。BER旨在提升近似可逆系统的学习效率,减少对复杂奖励函数设计的依赖。该方法通过反向步进转移构造逆向轨迹,以随机或固定目标为导向。作为可解释的双向方法,BER通过在学习过程中蒸馏回放经验来消除反向步进转移中的不准确性。鉴于软体机器人的复杂特性及其与环境的交互难题,我们将BER应用于基于无模型强化学习的软体蛇形机器人运动与导航控制——该机器人通过身体与地面间的各向异性摩擦可实现蛇形运动。此外,我们开发了动态仿真器以评估BER算法的有效性与效率。实验表明,机器人成功完成学习(达100%成功率),并能精准到达随机目标位置,平均速度比最优基线方法快48%。