When humans create sculptures, we are able to reason about how geometrically we need to alter the clay state to reach our target goal. We are not computing point-wise similarity metrics, or reasoning about low-level positioning of our tools, but instead determining the higher-level changes that need to be made. In this work, we propose LLM-Craft, a novel pipeline that leverages large language models (LLMs) to iteratively reason about and generate deformation-based crafting action sequences. We simplify and couple the state and action representations to further encourage shape-based reasoning. To the best of our knowledge, LLM-Craft is the first system successfully leveraging LLMs for complex deformable object interactions. Through our experiments, we demonstrate that with the LLM-Craft framework, LLMs are able to successfully reason about the deformation behavior of elasto-plastic objects. Furthermore, we find that LLM-Craft is able to successfully create a set of simple letter shapes. Finally, we explore extending the framework to reaching more ambiguous semantic goals, such as "thinner" or "bumpy". For videos please see our website: https://sites.google.com/andrew.cmu.edu/llmcraft.
翻译:当人类创作雕塑时,我们能够推理出需要如何从几何上改变黏土状态以达到目标形态。我们并非计算逐点相似性度量,也非推理工具的低层级定位,而是确定需要做出的更高层级改变。在本工作中,我们提出LLM-Craft——一种新颖的流程,它利用大语言模型(LLMs)迭代推理并生成基于变形的塑形动作序列。我们简化并耦合了状态与动作表示,以进一步促进基于形状的推理。据我们所知,LLM-Craft是首个成功利用LLMs进行复杂可变形物体交互的系统。通过实验,我们证明在LLM-Craft框架下,LLMs能够成功推理弹塑性物体的变形行为。此外,我们发现LLM-Craft能够成功创建一组简单的字母形状。最后,我们探索将该框架扩展到实现更模糊的语义目标,例如“更薄”或“凹凸不平”。视频请访问我们的网站:https://sites.google.com/andrew.cmu.edu/llmcraft。