We present an implementation of an online optimization algorithm for hitting a predefined target when returning ping-pong balls with a table tennis robot. The online algorithm optimizes over so-called interception policies, which define the manner in which the robot arm intercepts the ball. In our case, these are composed of the state of the robot arm (position and velocity) at interception time. Gradient information is provided to the optimization algorithm via the mapping from the interception policy to the landing point of the ball on the table, which is approximated with a black-box and a grey-box approach. Our algorithm is applied to a robotic arm with four degrees of freedom that is driven by pneumatic artificial muscles. As a result, the robot arm is able to return the ball onto any predefined target on the table after about 2-5 iterations. We highlight the robustness of our approach by showing rapid convergence with both the black-box and the grey-box gradients. In addition, the small number of iterations required to reach close proximity to the target also underlines the sample efficiency. A demonstration video can be found here: https://youtu.be/VC3KJoCss0k.
翻译:我们提出了一种在线优化算法的实现方法,用于乒乓球机器人在回球时将球击打到预定目标。该在线算法对所谓的截击策略进行优化,该策略定义了机器人手臂截击乒乓球的方式。在我们的案例中,这些策略由截击时刻机器人手臂的状态(位置和速度)组成。梯度信息通过从截击策略到球在球台上落点位置的映射提供给优化算法,该映射分别采用黑箱方法和灰箱方法进行近似。我们的算法应用于一个由气动人工肌肉驱动的四自由度机器人手臂。结果显示,机器人手臂能够在大约2-5次迭代后将球回击到球台上任意预定的目标位置。通过展示黑箱梯度和灰箱梯度均能快速收敛,我们强调了该方法的鲁棒性。此外,仅需少量迭代即可接近目标位置,也凸显了该方法的样本效率。演示视频可在以下链接查看:https://youtu.be/VC3KJoCss0k。