Robot sequential decision-making in the real world is a challenge because it requires the robots to simultaneously reason about the current world state and dynamics, while planning actions to accomplish complex tasks. On the one hand, declarative languages and reasoning algorithms well support representing and reasoning with commonsense knowledge. But these algorithms are not good at planning actions toward maximizing cumulative reward over a long, unspecified horizon. On the other hand, probabilistic planning frameworks, such as Markov decision processes (MDPs) and partially observable MDPs (POMDPs), well support planning to achieve long-term goals under uncertainty. But they are ill-equipped to represent or reason about knowledge that is not directly related to actions. In this article, we present a novel algorithm, called iCORPP, to simultaneously estimate the current world state, reason about world dynamics, and construct task-oriented controllers. In this process, robot decision-making problems are decomposed into two interdependent (smaller) subproblems that focus on reasoning to "understand the world" and planning to "achieve the goal" respectively. Contextual knowledge is represented in the reasoning component, which makes the planning component epistemic and enables active information gathering. The developed algorithm has been implemented and evaluated both in simulation and on real robots using everyday service tasks, such as indoor navigation, dialog management, and object delivery. Results show significant improvements in scalability, efficiency, and adaptiveness, compared to competitive baselines including handcrafted action policies.
翻译:现实世界中的机器人序贯决策面临挑战,这要求机器人同时推理当前世界状态与动态变化,同时规划行动以完成复杂任务。一方面,声明式语言与推理算法能有效支撑常识知识的表征与推理,但这些算法不擅长规划能使长期未指定时域内累积奖励最大化的行动。另一方面,马尔可夫决策过程(MDP)与部分可观测MDP(POMDP)等概率规划框架,能有效支持在不确定性下实现长期目标的规划,但它们难以表征或推理与行动无直接关联的知识。本文提出一种名为iCORPP的新型算法,可同时估计当前世界状态、推理世界动态并构建面向任务的控制器。该过程中,机器人决策问题被分解为两个相互依存的(更小)子问题,分别聚焦于"理解世界"的推理与"达成目标"的规划。上下文知识被嵌入推理组件,使规划组件具备认知能力,并支持主动信息采集。所提算法已在仿真环境与实体机器人上,通过室内导航、对话管理及物体递送等日常服务任务进行实现与评估。结果表明,与包括手工动作策略在内的竞争基线相比,该方法在可扩展性、效率与适应性方面均实现显著提升。