Engineering construction automation aims to transform natural language specifications into physically viable structures, requiring complex integrated reasoning under strict physical constraints. While modern LLMs possess broad knowledge and strong reasoning capabilities that make them promising candidates for this domain, their construction competencies remain largely unevaluated. To address this gap, we introduce BuildArena, the first physics-aligned interactive benchmark designed for language-driven engineering construction. Technically, it contributes to the community in two aspects: (1) an extendable task design strategy spanning static and dynamic mechanics across multiple difficulty tiers; (2) a 3D Spatial Geometric Computation Library for supporting construction based on language instructions. On nine frontier LLMs and three additional open-weight models, BuildArena comprehensively evaluates their capabilities for language-driven and physics-grounded construction automation. We release the code at https://github.com/AI4Science-WestlakeU/BuildArena to benefit construction automation in engineering applications.
翻译:工程建造自动化旨在将自然语言规范转化为物理上可行的结构,这要求在严格的物理约束下进行复杂的集成推理。尽管现代大语言模型(LLMs)拥有广泛的知识和强大的推理能力,使其成为该领域有前景的候选方案,但其建造能力仍未得到充分评估。针对这一空白,我们提出BuildArena——首个面向语言驱动工程建造的物理对齐交互式基准测试。在技术层面,本工作对社区有两方面贡献:(1)一种可扩展的任务设计策略,涵盖跨多个难度层级的静态与动态力学;(2)一套支持基于语言指令进行建造的三维空间几何计算库。通过对九种前沿大语言模型及三种额外开源权重模型的评估,BuildArena全面衡量了它们在语言驱动、物理约束下的建造自动化能力。我们已公开代码(https://github.com/AI4Science-WestlakeU/BuildArena),以推动工程应用中建造自动化的发展。