Real-time computation of optimal control is a challenging problem and, to solve this difficulty, many frameworks proposed to use learning techniques to learn (possibly sub-optimal) controllers and enable their usage in an online fashion. Among these techniques, the optimal motion framework is a simple, yet powerful technique, that obtained success in many complex real-world applications. The main idea of this approach is to take advantage of dynamic motion primitives, a widely used tool in robotics to learn trajectories from demonstrations. While usually these demonstrations come from humans, the optimal motion framework is based on demonstrations coming from optimal solutions, such as the ones obtained by numeric solvers. As usual in many learning techniques, a drawback of this approach is that it is hard to estimate the suboptimality of learned solutions, since finding easily computable and non-trivial upper bounds to the error between an optimal solution and a learned solution is, in general, unfeasible. However, we show in this paper that it is possible to estimate this error for a broad class of problems. Furthermore, we apply this estimation technique to achieve a novel and more efficient sampling scheme to be used within the optimal motion framework, enabling the usage of this framework in some scenarios where the computational resources are limited.
翻译:实时计算最优控制是一个具有挑战性的问题,为解决这一难题,许多框架提出利用学习技术来学习(可能为次优的)控制器,并使其能够在线使用。在这些技术中,最优运动框架是一种简单而强大的技术,已在诸多复杂的实际应用中获得成功。该方法的核心思想是利用动态运动基元——一种在机器人学中广泛用于从示教中学习轨迹的工具。通常这些示教来自人类,而最优运动框架则基于来自最优解(如数值求解器获得的解)的示教。与其他许多学习技术类似,该方法的缺点在于难以估计所学解的次优性,因为通常情况下,找到计算简便且非平凡的最优解与所学解之间误差的上界是不可行的。然而,本文证明对于一大类问题,该误差是可以估计的。此外,我们应用这一估计技术,在最优运动框架内实现了一种新颖且更高效的采样方案,从而使得该框架能够在计算资源有限的场景下得以应用。