Recent advances in language-conditioned robotic manipulation have leveraged imitation and reinforcement learning to enable robots to execute tasks from human commands. However, these methods often suffer from limited generalization, adaptability, and the lack of large-scale specialized datasets, unlike data-rich domains such as computer vision, making long-horizon task execution challenging. To address these gaps, we introduce DAHLIA, a data-agnostic framework for language-conditioned long-horizon robotic manipulation, leveraging large language models (LLMs) for real-time task planning and execution. DAHLIA employs a dual-tunnel architecture, where an LLM-powered planner collaborates with co-planners to decompose tasks and generate executable plans, while a reporter LLM provides closed-loop feedback, enabling adaptive re-planning and ensuring task recovery from potential failures. Moreover, DAHLIA integrates chain-of-thought (CoT) in task reasoning and temporal abstraction for efficient action execution, enhancing traceability and robustness. Our framework demonstrates state-of-the-art performance across diverse long-horizon tasks, achieving strong generalization in both simulated and real-world scenarios. Videos and code are available at https://ghiara.github.io/DAHLIA/.
翻译:近期语言条件机器人操作研究通过模仿学习与强化学习,使机器人能够根据人类指令执行任务。然而,与计算机视觉等数据密集型领域不同,这些方法常受限于泛化能力不足、适应性有限以及缺乏大规模专用数据集,导致长程任务执行面临挑战。为弥补这些不足,我们提出DAHLIA——一种用于语言条件长程机器人操作的数据无关框架,该框架利用大语言模型(LLMs)进行实时任务规划与执行。DAHLIA采用双通道架构:由LLM驱动的规划器与协同规划器协作分解任务并生成可执行方案,同时报告器LLM提供闭环反馈,实现自适应重规划并确保任务从潜在故障中恢复。此外,DAHLIA在任务推理中整合思维链(CoT)机制与时序抽象策略,以提升动作执行效率,从而增强可追溯性与鲁棒性。我们的框架在多样化长程任务中展现出最先进的性能,在仿真与真实场景中均实现了强大的泛化能力。演示视频与代码发布于 https://ghiara.github.io/DAHLIA/。