Open-world embodied agents must solve long-horizon tasks where the main bottleneck is not single-step planning quality but how interaction experience is organized and evolved. To this end, we present Steve-Evolving, a non-parametric self-evolving framework that tightly couples fine-grained execution diagnosis with dual-track knowledge distillation in a closed loop. The method follows three phases: Experience Anchoring, Experience Distillation, and Knowledge-Driven Closed-Loop Control. In detail, Experience Anchoring solidifies each subgoal attempt into a structured experience tuple with a fixed schema (pre-state, action, diagnosis-result, and post-state) and organizes it in a three-tier experience space with multi-dimensional indices (e.g., condition signatures, spatial hashing, and semantic tags) plus rolling summarization for efficient and auditable recall. To ensure sufficient information density for attribution, the execution layer provides compositional diagnosis signals beyond binary outcomes, including state-difference summaries, enumerated failure causes, continuous indicators, and stagnation/loop detection. Moreover, successful trajectories of Experience Distillation are generalized into reusable skills with explicit preconditions and verification criteria, while failures are distilled into executable guardrails that capture root causes and forbid risky operations at both subgoal and task granularities. Besides, Knowledge-Driven Closed-Loop Control retrieved skills and guardrails are injected into an LLM planner, and diagnosis-triggered local replanning updates the active constraints online, forming a continual evolution process without any model parameter updates. Experiments on the long-horizon suite of Minecraft MCU demonstrate consistent improvements over static-retrieval baselines.
翻译:开放世界具身智能体需解决长时程任务,其核心瓶颈并非单步规划质量,而在于交互经验的组织与进化方式。为此,我们提出Steve-Evolving——一种非参数化的自进化框架,通过闭环机制将细粒度执行诊断与双轨知识蒸馏紧密耦合。该方法包含三个阶段:经验锚定、经验蒸馏与知识驱动的闭环控制。具体而言,经验锚定将每个子目标尝试固化为具有固定模式的结构化经验元组(前状态、动作、诊断结果、后状态),并组织至具备多维索引(如条件签名、空间哈希与语义标签)及滚动摘要的三层经验空间中,以实现高效且可追溯的检索。为确保归因所需的信息密度,执行层提供超越二元结果的组合式诊断信号,包括状态差异摘要、枚举式失败原因、连续指标及停滞/循环检测。此外,经验蒸馏阶段将成功轨迹泛化为具有显式前提条件与验证准则的可复用技能,同时将失败案例提炼为可执行的防护规则,这些规则能捕捉根本原因并在子目标与任务粒度上禁止风险操作。知识驱动的闭环控制则通过检索技能与防护规则注入LLM规划器,并由诊断触发的局部重规划在线更新动态约束,形成无需模型参数更新的持续进化过程。在Minecraft MCU长时程任务套件上的实验表明,该方法相较于静态检索基线取得了持续的性能提升。