Memory-based Controllers for Efficient Data-driven Control of Soft Robots

Controller design for soft robots is challenging due to nonlinear deformation and high degrees of freedom of flexible material. The data-driven approach is a promising solution to the controller design problem for soft robots. However, the existing data-driven controller design methods for soft robots suffer from two drawbacks: (i) they require excessively long training time, and (ii) they may result in potentially inefficient controllers. This paper addresses these issues by developing two memory-based controllers for soft robots that can be trained in a data-driven fashion: the finite memory controller (FMC) approach and the long short-term memory (LSTM) based approach. An FMC stores the tracking errors at different time instances and computes the actuation signal according to a weighted sum of the stored tracking errors. We develop three reinforcement learning algorithms for computing the optimal weights of an FMC using the Q-learning, soft actor-critic, and deterministic policy gradient (DDPG) methods. An LSTM-based controller is composed of an LSTM network where the inputs of the network are the robot's desired configuration and current configuration. The LSTM network computes the required actuation signal for the soft robot to follow the desired configuration. We study the performance of the proposed approaches in controlling a soft finger where, as benchmarks, we use the existing reinforcement learning (RL) based controllers and proportional-integral-derivative (PID) controllers. Our numerical results show that the training time of the proposed memory-based controllers is significantly shorter than that of the classical RL-based controllers. Moreover, the proposed controllers achieve a smaller tracking error compared with the classical RL algorithms and the PID controller.

翻译：软体机器人的控制器设计因柔性材料的非线性变形和高自由度而具有挑战性。数据驱动方法是解决软体机器人控制器设计问题的一种有前景的方案。然而，现有的软体机器人数据驱动控制器设计方法存在两个缺陷：（i）训练时间过长，（ii）可能产生低效控制器。本文通过开发两种可进行数据驱动训练的基于记忆的软体机器人控制器来解决这些问题：有限记忆控制器（FMC）方法和基于长短期记忆（LSTM）的方法。FMC存储不同时间点的跟踪误差，并根据存储跟踪误差的加权和计算驱动信号。我们开发了三种强化学习算法，分别采用Q学习、柔性演员-评论家和确定性策略梯度（DDPG）方法来计算FMC的最优权重。基于LSTM的控制器由LSTM网络构成，该网络的输入是机器人的期望构型和当前构型。LSTM网络计算软体机器人实现期望构型所需的驱动信号。我们以软体手指控制为研究对象，采用现有基于强化学习（RL）的控制器和比例-积分-微分（PID）控制器作为基准，评估所提方法的性能。数值结果表明，所提基于记忆的控制器的训练时间显著短于经典基于RL的控制器。此外，与经典RL算法和PID控制器相比，所提控制器实现了更小的跟踪误差。