We generalize the derivation of model predictive path integral control (MPPI) to allow for a single joint distribution across controls in the control sequence. This reformation allows for the implementation of adaptive importance sampling (AIS) algorithms into the original importance sampling step while still maintaining the benefits of MPPI such as working with arbitrary system dynamics and cost functions. The benefit of optimizing the proposal distribution by integrating AIS at each control step is demonstrated in simulated environments including controlling multiple cars around a track. The new algorithm is more sample efficient than MPPI, achieving better performance with fewer samples. This performance disparity grows as the dimension of the action space increases. Results from simulations suggest the new algorithm can be used as an anytime algorithm, increasing the value of control at each iteration versus relying on a large set of samples.
翻译:我们推广了模型预测路径积分控制(MPPI)的推导,以允许控制序列中所有控制量共享单一联合分布。这一重构使得在原始重要性采样步骤中能够实现自适应重要性采样(AIS)算法,同时保留MPPI的优势,例如适用于任意系统动力学和成本函数。通过在每个控制步中集成AIS来优化提议分布的优势,在模拟环境中得到了验证,包括控制多辆汽车绕赛道行驶等场景。新算法比MPPI具有更高的样本效率,能够以更少的样本实现更优的性能。这种性能差距随着动作空间维度的增加而扩大。仿真结果表明,该新算法可作为任意时间算法使用,即在每次迭代中提升控制价值,而非依赖大量样本。