Real-world robot task planning must operate under both stochastic action execution and partial observability, yet constructing Partially Observable Markov Decision Process (POMDP) models for real robotics domains remains difficult and labor-intensive. We introduce PO-PDDL, a symbolic formulation of POMDPs that preserves the relational structure and LLM-friendly syntax of the Planning Domain Definition Language (PDDL), while explicitly modeling partial observability, stochasticity, and beliefs. Building on this formulation, we propose a demonstration-driven pipeline for learning PO-PDDL models. The proposed method reconstructs latent symbolic state trajectories from real-robot execution videos, identifies partial observability via inconsistencies between inferred states and visual observations, and learns stochastic transition and observation models accordingly. The resulting PO-PDDL domains are reusable across tasks and enable online belief-space planning under both perception and execution uncertainty. Experiments on real-world long-horizon manipulation tasks show that our method consistently outperforms existing PDDL and POMDP model-learning approaches, achieving robust task planning under uncertainty with significantly lower planning cost.
翻译:真实世界的机器人任务规划必须在随机动作执行和部分可观察性双重条件下运行,然而为实际机器人领域构建部分可观察马尔可夫决策过程(POMDP)模型仍然困难且劳动密集。我们提出PO-PDDL,一种POMDP的符号化表述,它在保留规划域定义语言(PDDL)的关系结构和利于大语言模型(LLM)解析的语法特性的同时,显式地建模了部分可观察性、随机性和信念状态。基于这一表述,我们提出了一种基于演示驱动流水线的PO-PDDL模型学习方法。该方法从真实机器人执行视频中重建潜在的符号化状态轨迹,通过推断状态与视觉观察之间的不一致性识别部分可观察性,并据此学习随机转移模型和观测模型。最终得到的PO-PDDL域可在不同任务间复用,并在感知和执行不确定性下实现在线信念空间规划。在真实世界长时域操作任务上的实验表明,我们的方法始终优于现有的PDDL和POMDP模型学习方法,以显著更低的规划成本实现了不确定性下的鲁棒任务规划。