We address the challenge of utilizing large language models (LLMs) for complex embodied tasks, in the environment where decision-making systems operate timely on capacity-limited, off-the-shelf devices. We present DeDer, a framework for decomposing and distilling the embodied reasoning capabilities from LLMs to efficient, small language model (sLM)-based policies. In DeDer, the decision-making process of LLM-based strategies is restructured into a hierarchy with a reasoning-policy and planning-policy. The reasoning-policy is distilled from the data that is generated through the embodied in-context learning and self-verification of an LLM, so it can produce effective rationales. The planning-policy, guided by the rationales, can render optimized plans efficiently. In turn, DeDer allows for adopting sLMs for both policies, deployed on off-the-shelf devices. Furthermore, to enhance the quality of intermediate rationales, specific to embodied tasks, we devise the embodied knowledge graph, and to generate multiple rationales timely through a single inference, we also use the contrastively prompted attention model. Our experiments with the ALFRED benchmark demonstrate that DeDer surpasses leading language planning and distillation approaches, indicating the applicability and efficiency of sLM-based embodied policies derived through DeDer.
翻译:我们致力于解决在决策系统需在容量有限的现成设备上及时运行的环境中,利用大语言模型(LLMs)完成复杂具身任务的挑战。我们提出了DeDer框架,用于从LLMs中分解并蒸馏其具身推理能力,以构建基于高效小语言模型(sLM)的策略。在DeDer中,基于LLM的决策过程被重构为包含推理策略和规划策略的层次结构。推理策略通过LLM的具身情境学习与自我验证所生成的数据进行蒸馏,从而能够产生有效的推理依据。规划策略则在推理依据的指导下,能够高效地生成优化方案。由此,DeDer允许在现成设备上部署基于sLM的两种策略。此外,为提升针对具身任务的中间推理依据的质量,我们设计了具身知识图谱;同时,为了通过单次推理及时生成多个推理依据,我们还采用了对比提示注意力模型。我们在ALFRED基准测试上的实验表明,DeDer超越了领先的语言规划与蒸馏方法,证明了通过DeDer衍生的基于sLM的具身策略的适用性与高效性。