Nonlinear model predictive control (NMPC) is typically restricted to short, finite horizons to limit the computational burden of online optimization. This makes a global planner necessary to avoid local minima when using NMPC for navigation in complex environments. For this reason, the performance of NMPC approaches are often limited by that of the global planner. While control policies trained with reinforcement learning (RL) can theoretically learn to avoid such local minima, they are usually unable to guarantee enforcement of general state constraints. In this paper, we augment a sampling-based stochastic NMPC (SNMPC) approach with an RL trained perception-informed value function. This allows the system to avoid observable local minima in the environment by reasoning about perception information beyond the finite planning horizon. By using Probably Approximately Correct NMPC (PAC-NMPC) as our base controller, we are also able to generate statistical guarantees of performance and safety. We demonstrate our approach in simulation and on hardware using a 1/10th scale rally car with lidar.
翻译:非线性模型预测控制(NMPC)通常被限制在较短的有限时域内,以降低在线优化的计算负担。这使得在复杂环境中使用NMPC进行导航时,必须借助全局规划器来避免局部极小值。因此,NMPC方法的性能常常受限于全局规划器的性能。尽管通过强化学习(RL)训练的控制策略理论上可以学会避免此类局部极小值,但它们通常无法保证对一般状态约束的执行。本文中,我们通过将基于采样的随机NMPC(SNMPC)方法与经RL训练的感知信息值函数相结合。这使得系统能够通过推理超出有限规划时域的感知信息,来避免环境中可观测的局部极小值。通过使用概率近似正确NMPC(PAC-NMPC)作为基础控制器,我们还能生成性能和安全的统计保证。我们在仿真中以及使用配备激光雷达的1/10比例拉力赛车的硬件上展示了我们的方法。