Optimal plans in Constrained Partially Observable Markov Decision Processes (CPOMDPs) maximize reward objectives while satisfying hard cost constraints, generalizing safe planning under state and transition uncertainty. Unfortunately, online CPOMDP planning is extremely difficult in large or continuous problem domains. In many large robotic domains, hierarchical decomposition can simplify planning by using tools for low-level control given high-level action primitives (options). We introduce Constrained Options Belief Tree Search (COBeTS) to leverage this hierarchy and scale online search-based CPOMDP planning to large robotic problems. We show that if primitive option controllers are defined to satisfy assigned constraint budgets, then COBeTS will satisfy constraints anytime. Otherwise, COBeTS will guide the search towards a safe sequence of option primitives, and hierarchical monitoring can be used to achieve runtime safety. We demonstrate COBeTS in several safety-critical, constrained partially observable robotic domains, showing that it can plan successfully in continuous CPOMDPs while non-hierarchical baselines cannot.
翻译:在约束部分可观测马尔可夫决策过程(CPOMDP)中,最优规划需在满足严格成本约束的同时最大化奖励目标,这泛化了状态与转移不确定性下的安全规划。然而,在线CPOMDP规划在大规模或连续问题域中极其困难。在许多大型机器人域中,分层分解可通过利用高层动作原语(选项)的低层控制工具来简化规划。我们提出约束选项信念树搜索(COBeTS),以利用这种层次结构将基于搜索的在线CPOMDP规划扩展到大型机器人问题。我们证明:若原始选项控制器被定义为满足分配的约束预算,则COBeTS将随时满足约束;否则,COBeTS将引导搜索朝向安全的选项原语序列,并通过分层监控实现运行时安全。我们在多个安全关键的约束部分可观测机器人域中验证了COBeTS,表明它能在连续CPOMDP中成功规划,而非分层基线方法则无法做到。