Embodied instruction following (EIF) requires agents to understand and execute complex natural language commands within interactive 3D environments. Despite recent advances, existing methods often fail in long-horizon planning and handling irreversible state changes, resulting in low task success rates. To address these challenges, we introduce RePlan-Bot, a novel EIF agent that performs multi-level, continuous replanning throughout task execution. RePlan-Bot integrates a high-level LLM-based auditor for dynamic sub-goal adjustments guided by environmental feedback, a commonsense-guided search mechanism based on a multi-layered instance map for precise and structured object localization, and a lightweight ViT-based corrector to preemptively fix risky low-level actions. Evaluated on the ALFRED benchmark, RePlan-Bot achieves state-of-the-art performance in both seen and unseen environments, demonstrating superior adaptability and reliability.
翻译:具身指令跟随(EIF)要求智能体在交互式3D环境中理解并执行复杂的自然语言指令。尽管近期取得了进展,现有方法在长程规划与处理不可逆状态变化时仍常遭遇失败,导致任务成功率低下。针对这些挑战,我们提出RePlan-Bot——一种在任务执行过程中进行多层级、连续重规划的新型EIF智能体。RePlan-Bot集成了三个核心组件:基于环境反馈动态调整子目标的高层LLM审计器、依托多层实例地图实现精确结构化目标定位的常识引导搜索机制,以及旨在主动修正高风险底层动作的轻量级ViT校正器。在ALFRED基准上的评估表明,RePlan-Bot在已知与未知环境中均达到了最先进性能,展现出卓越的适应性与可靠性。