Despite a widespread success in various applications, large language models (LLMs) often stumble when tackling basic physical reasoning or executing robotics tasks, due to a lack of direct experience with the physical nuances of the real world. To address these issues, we propose a Grounding Large language model with Imperfect world MOdel (GLIMO), which utilizes proxy world models such as simulators to collect and synthesize trining data. GLIMO incorporates an LLM agent-based data generator to automatically create high-quality and diverse instruction datasets. The generator includes an iterative self-refining module for temporally consistent experience sampling, a diverse set of question-answering instruction seeds, and a retrieval-augmented generation module for reflecting on prior experiences. Comprehensive experiments show that our approach improve the performance of strong open-source LLMs like LLaMA-3 with a performance boost of 2.04 $\times$, 1.54 $\times$, and 1.82 $\times$ across three different benchmarks, respectively. The performance is able to compete with or surpass their larger counterparts such as GPT-4.
翻译:尽管大语言模型(LLMs)在各种应用中取得了广泛成功,但由于缺乏对现实世界物理细微差别的直接体验,它们在处理基础物理推理或执行机器人任务时常常遇到困难。为解决这些问题,我们提出了基于不完美世界模型的落地大语言模型(GLIMO),该模型利用模拟器等代理世界模型来收集和合成训练数据。GLIMO包含一个基于LLM智能体的数据生成器,用于自动创建高质量、多样化的指令数据集。该生成器包含一个用于时序一致经验采样的迭代自优化模块、一组多样化的问答指令种子,以及一个用于反思先前经验的检索增强生成模块。综合实验表明,我们的方法显著提升了如LLaMA-3等强大开源LLMs的性能,在三个不同基准测试中分别实现了2.04倍、1.54倍和1.82倍的性能提升。其性能能够与GPT-4等更大规模模型竞争甚至超越。