Open world language conditioned task planning is crucial for robots operating in large-scale household environments. While many recent works attempt to address this problem using Large Language Models (LLMs) via prompting or training, a key challenge remains scalability. Performance often degrades rapidly with increasing environment size, plan length, instruction ambiguity, and constraint complexity. In this work, we propose Any House Any Task (AHAT), a household task planner optimized for long-horizon planning in large environments given ambiguous human instructions. At its core, AHAT utilizes an LLM trained to map task instructions and textual scene graphs into grounded subgoals defined in the Planning Domain Definition Language (PDDL). These subgoals are subsequently solved to generate feasible and optimal long-horizon plans through explicit symbolic reasoning. To enhance the model's ability to decompose complex and ambiguous intentions, we introduce TGPO, a novel reinforcement learning algorithm that integrates external correction of intermediate reasoning traces into Group Relative Policy Optimization (GRPO). Experiments demonstrate that AHAT achieves significant performance gains over state-of-the-art prompting, planning, and learning methods, particularly in human-style household tasks characterized by brief instructions but requiring complex execution plans.
翻译:开放世界语言条件任务规划对于机器人在大规模家庭环境中运行至关重要。尽管近期许多研究尝试通过提示或训练使用大型语言模型(LLM)解决此问题,可扩展性仍是核心挑战。随着环境规模扩大、规划长度增加、指令模糊性增强以及约束复杂度提高,系统性能往往急剧下降。本研究提出任意房屋任意任务(AHAT)——一种针对大规模环境中模糊人类指令进行长程规划优化的家庭任务规划器。其核心在于利用一个经过训练的LLM,将任务指令与文本场景图映射为基于规划领域定义语言(PDDL)的具象化子目标。随后通过显式符号推理求解这些子目标,生成可行且最优的长程规划方案。为提升模型分解复杂模糊意图的能力,我们提出TGPO算法:一种将中间推理轨迹的外部校正机制整合到群组相对策略优化(GRPO)中的新型强化学习算法。实验表明,AHAT在性能上显著优于当前最先进的提示、规划与学习方法,尤其在以简短指令为特征但需复杂执行计划的人类风格家庭任务中表现突出。