The ability of a robot to plan complex behaviors with real-time computation, rather than adhering to predesigned or offline-learned routines, alleviates the need for specialized algorithms or training for each problem instance. Monte Carlo Tree Search is a powerful planning algorithm that strategically explores simulated future possibilities, but it requires a discrete problem representation that is irreconcilable with the continuous dynamics of the physical world. We present Spectral Expansion Tree Search (SETS), a real-time, tree-based planner that uses the spectrum of the locally linearized system to construct a low-complexity and approximately equivalent discrete representation of the continuous world. We prove SETS converges to a bound of the globally optimal solution for continuous, deterministic and differentiable Markov Decision Processes, a broad class of problems that includes underactuated nonlinear dynamics, non-convex reward functions, and unstructured environments. We experimentally validate SETS on drone, spacecraft, and ground vehicle robots and one numerical experiment, each of which is not directly solvable with existing methods. We successfully show SETS automatically discovers a diverse set of optimal behaviors and motion trajectories in real time.
翻译:机器人能够通过实时计算规划复杂行为,而非遵循预先设计或离线学习的固定流程,这降低了对每个问题实例都需要专门算法或训练的需求。蒙特卡洛树搜索是一种强大的规划算法,它通过策略性地探索模拟的未来可能性进行决策,但该算法要求问题具有离散表示形式,这与物理世界的连续动力学特性存在本质冲突。本文提出谱展开树搜索,这是一种基于树结构的实时规划器,它利用局部线性化系统的频谱来构建一个低复杂度且近似等效的连续世界离散表示。我们证明了对于连续、确定且可微的马尔可夫决策过程——这类问题涵盖欠驱动非线性动力学、非凸奖励函数和非结构化环境——SETS能够收敛到全局最优解的一个有界范围内。我们通过无人机、航天器、地面移动机器人平台以及一项数值实验对SETS进行了验证,这些实验均无法直接通过现有方法求解。实验成功表明,SETS能够实时自动发现多种最优行为与运动轨迹。