The paper introduces DiSProD, an online planner developed for environments with probabilistic transitions in continuous state and action spaces. DiSProD builds a symbolic graph that captures the distribution of future trajectories, conditioned on a given policy, using independence assumptions and approximate propagation of distributions. The symbolic graph provides a differentiable representation of the policy's value, enabling efficient gradient-based optimization for long-horizon search. The propagation of approximate distributions can be seen as an aggregation of many trajectories, making it well-suited for dealing with sparse rewards and stochastic environments. An extensive experimental evaluation compares DiSProD to state-of-the-art planners in discrete-time planning and real-time control of robotic systems. The proposed method improves over existing planners in handling stochastic environments, sensitivity to search depth, sparsity of rewards, and large action spaces. Additional real-world experiments demonstrate that DiSProD can control ground vehicles and surface vessels to successfully navigate around obstacles.
翻译:本文提出DiSProD,一种在线规划器,专为具有连续状态和动作空间中概率转移的环境而设计。DiSProD构建一个符号图,该图基于独立性假设和分布的近似传播来捕获给定策略下的未来轨迹分布。这个符号图提供了策略值的可微表示,从而支持基于梯度的搜索以实现高效的长视野搜索。近似分布的传播可被视为多条轨迹的聚合,因此特别适用于处理稀疏奖励和随机环境。广泛的实验评估将DiSProD与离散时间规划及机器人系统实时控制中的前沿规划器进行了比较。所提方法在处理随机环境、对搜索深度的敏感性、奖励稀疏性以及大动作空间方面改进了现有规划器。额外的真实世界实验表明,DiSProD能够控制地面车辆和水面舰艇成功绕开障碍物。