The paper introduces DiSProD, an online planner developed for environments with probabilistic transitions in continuous state and action spaces. DiSProD builds a symbolic graph that captures the distribution of future trajectories, conditioned on a given policy, using independence assumptions and approximate propagation of distributions. The symbolic graph provides a differentiable representation of the policy's value, enabling efficient gradient-based optimization for long-horizon search. The propagation of approximate distributions can be seen as an aggregation of many trajectories, making it well-suited for dealing with sparse rewards and stochastic environments. An extensive experimental evaluation compares DiSProD to state-of-the-art planners in discrete-time planning and real-time control of robotic systems. The proposed method improves over existing planners in handling stochastic environments, sensitivity to search depth, sparsity of rewards, and large action spaces. Additional real-world experiments demonstrate that DiSProD can control ground vehicles and surface vessels to successfully navigate around obstacles.
翻译:摘要:本文介绍了DiSProD,一种专为连续状态与动作空间中具有概率转移的环境而设计的在线规划器。DiSProD通过独立性假设与分布的近似传播,构建了一个捕捉给定策略下未来轨迹分布的符号图。该符号图提供了策略价值的可微表示,从而能够实现基于梯度的长时域搜索高效优化。近似分布的传播可视为多条轨迹的聚合,使其特别适用于处理稀疏奖励与随机环境。通过大量实验评估,将DiSProD与离散时间规划及机器人系统实时控制领域的最新规划器进行对比。结果表明,所提方法在处理随机环境、对搜索深度的敏感性、奖励稀疏性以及大动作空间方面均优于现有规划器。此外,真实世界实验证明,DiSProD能够控制地面车辆与水面船舶成功绕开障碍物航行。