We propose a diffusion approximation method to the continuous-state Markov Decision Processes (MDPs) that can be utilized to address autonomous navigation and control in unstructured off-road environments. In contrast to most decision-theoretic planning frameworks that assume fully known state transition models, we design a method that eliminates such a strong assumption that is often extremely difficult to engineer in reality. We first take the second-order Taylor expansion of the value function. The Bellman optimality equation is then approximated by a partial differential equation, which only relies on the first and second moments of the transition model. By combining the kernel representation of the value function, we design an efficient policy iteration algorithm whose policy evaluation step can be represented as a linear system of equations characterized by a finite set of supporting states. We first validate the proposed method through extensive simulations in 2D obstacle avoidance and 2.5D terrain navigation problems. The results show that the proposed approach leads to a much superior performance over several baselines. We then develop a system that integrates our decision-making framework with onboard perception and conduct real-world experiments in both cluttered indoor and unstructured outdoor environments. The results from the physical systems further demonstrate the applicability of our method in challenging real-world environments.
翻译:我们提出了一种针对连续状态马尔可夫决策过程的扩散近似方法,可用于解决非结构化越野环境中的自主导航与控制问题。与大多数假设状态转移模型完全已知的决策理论规划框架不同,我们设计了一种方法消除了这一在现实中往往极难实现的强假设。首先对价值函数进行二阶泰勒展开,进而将贝尔曼最优性方程近似为一个仅依赖于转移模型的一阶矩和二阶矩的偏微分方程。通过结合价值函数的核表示,我们设计了一个高效策略迭代算法,其策略评估步骤可表示为由有限支撑状态集刻画的线性方程组。通过二维避障与2.5维地形导航问题的广泛仿真实验,我们验证了所提方法的有效性,结果表明该方法在性能上显著优于多个基线方法。随后,我们开发了一套将决策框架与机载感知系统相结合的系统,并在杂乱室内环境与非结构化室外环境中开展了实际实验。物理系统的实验成果进一步证明了该方法在具有挑战性的真实环境中的适用性。