We propose a diffusion approximation method to the continuous-state Markov Decision Processes (MDPs) that can be utilized to address autonomous navigation and control in unstructured off-road environments. In contrast to most decision-theoretic planning frameworks that assume fully known state transition models, we design a method that eliminates such a strong assumption that is often extremely difficult to engineer in reality. We first take the second-order Taylor expansion of the value function. The Bellman optimality equation is then approximated by a partial differential equation, which only relies on the first and second moments of the transition model. By combining the kernel representation of the value function, we design an efficient policy iteration algorithm whose policy evaluation step can be represented as a linear system of equations characterized by a finite set of supporting states. We first validate the proposed method through extensive simulations in 2D obstacle avoidance and 2.5D terrain navigation problems. The results show that the proposed approach leads to a much superior performance over several baselines. We then develop a system that integrates our decision-making framework with onboard perception and conduct real-world experiments in both cluttered indoor and unstructured outdoor environments. The results from the physical systems further demonstrate the applicability of our method in challenging real-world environments.
翻译:我们提出一种适用于连续状态马尔可夫决策过程的扩散近似方法,可用于解决非结构化越野环境中的自主导航与控制问题。与大多数假设状态转移模型完全已知的决策理论规划框架不同,我们设计的方法消除了这种在现实中往往极难实现的强假设。首先对值函数进行二阶泰勒展开,随后将贝尔曼最优性方程近似为仅依赖于转移模型一阶矩和二阶矩的偏微分方程。通过结合值函数的核表示,我们设计了一种高效的策略迭代算法,其策略评估步骤可表征为有限支持状态集上的线性方程组。通过二维避障与2.5维地形导航问题的广泛仿真验证,结果表明所提方法相较多种基线方法具有显著性能优势。进一步地,我们构建了将决策框架与机载感知相集成的系统,并在杂乱的室内环境与非结构化室外环境中开展真实场景实验。物理系统实验结果进一步证明了该方法在复杂现实环境中的适用性。