LLMPhy：利用大型语言模型与世界模型进行复杂物理推理 (LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models)

Physical reasoning is an important skill needed for robotic agents when operating in the real world. However, solving such reasoning problems often involves hypothesizing and reflecting over complex multi-body interactions under the effect of a multitude of physical forces and thus learning all such interactions poses a significant hurdle for state-of-the-art machine learning frameworks, including large language models (LLMs). To study this problem, we propose a new physical reasoning task and a dataset, dubbed TraySim. Our task involves predicting the dynamics of several objects on a tray that is given an external impact -- the domino effect of the ensued object interactions and their dynamics thus offering a challenging yet controlled setup, with the goal of reasoning being to infer the stability of the objects after the impact. To solve this complex physical reasoning task, we present LLMPhy, a zero-shot black-box optimization framework that leverages the physics knowledge and program synthesis abilities of LLMs, and synergizes these abilities with the world models built into modern physics engines. Specifically, LLMPhy uses an LLM to generate code to iteratively estimate the physical hyperparameters of the system (friction, damping, layout, etc.) via an implicit analysis-by-synthesis approach using a (non-differentiable) simulator in the loop and uses the inferred parameters to imagine the dynamics of the scene towards solving the reasoning task. To show the effectiveness of LLMPhy, we present experiments on our TraySim dataset to predict the steady-state poses of the objects. Our results show that the combination of the LLM and the physics engine leads to state-of-the-art zero-shot physical reasoning performance, while demonstrating superior convergence against standard black-box optimization methods and better estimation of the physical parameters.

翻译：物理推理是机器人在现实世界中操作所需的重要技能。然而，解决此类推理问题通常涉及在多种物理力作用下对复杂的多体相互作用进行假设与反思，因此学习所有这些相互作用对包括大型语言模型（LLMs）在内的当前最先进的机器学习框架构成了重大障碍。为研究此问题，我们提出了一项新的物理推理任务及相应数据集，命名为TraySim。该任务涉及预测托盘上多个物体在受到外部冲击后的动力学行为——由物体相互作用及其动力学引发的连锁效应，从而提供了一个具有挑战性且受控的实验设置，其推理目标为推断冲击后物体的稳定性。为解决这一复杂的物理推理任务，我们提出了LLMPhy，这是一个零样本黑盒优化框架，它利用LLMs的物理知识与程序生成能力，并将这些能力与现代物理引擎内置的世界模型相协同。具体而言，LLMPhy使用LLM生成代码，通过基于隐式分析-合成的方法，在（不可微分的）模拟器闭环中迭代估计系统的物理超参数（摩擦、阻尼、布局等），并利用推断出的参数推演场景动力学以解决推理任务。为验证LLMPhy的有效性，我们在TraySim数据集上进行了预测物体稳态姿态的实验。结果表明，LLM与物理引擎的结合实现了当前最优的零样本物理推理性能，同时相较于标准黑盒优化方法展现出更优的收敛性及更准确的物理参数估计能力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日