We approach the fundamental problem of obstacle avoidance for robotic systems via the lens of online learning. In contrast to prior work that either assumes worst-case realizations of uncertainty in the environment or a stationary stochastic model of uncertainty, we propose a method that is efficient to implement and provably grants instance-optimality with respect to perturbations of trajectories generated from an open-loop planner (in the sense of minimizing worst-case regret). The resulting policy adapts online to realizations of uncertainty and provably compares well with the best obstacle avoidance policy in hindsight from a rich class of policies. The method is validated in simulation on a dynamical system environment and compared to baseline open-loop planning and robust Hamilton- Jacobi reachability techniques. Further, it is implemented on a hardware example where a quadruped robot traverses a dense obstacle field and encounters input disturbances due to time delays, model uncertainty, and dynamics nonlinearities.
翻译:我们通过在线学习的视角探讨机器人系统避障这一基本问题。与先前工作要么假设环境中不确定性的最坏情况实现、要么假设不确定性的平稳随机模型不同,我们提出了一种高效实现的方法,并在轨迹扰动方面(即最小化最坏情况遗憾)相对于开环规划器生成的轨迹可证明地赋予实例最优性。所得策略能够在线适应不确定性的实际实现,并从一类丰富策略中可证明地与事后最优避障策略相媲美。该方法在动态系统环境中进行了仿真验证,并与基线开环规划与鲁棒Hamilton-Jacobi可达性技术进行了比较。此外,该方法在实际硬件上进行了实现,使四足机器人在密集障碍物场中穿行,并应对由时间延迟、模型不确定性和动态非线性引起的输入扰动。