Engineering construction automation aims to transform natural language specifications into physically viable structures, requiring complex integrated reasoning under strict physical constraints. While modern LLMs possess broad knowledge and strong reasoning capabilities that make them promising candidates for this domain, their construction competencies remain largely unevaluated. To address this gap, we introduce BuildArena, the first physics-aligned interactive benchmark designed for language-driven engineering construction. It takes a first step towards engineering automation using LLMs. Technically, it contributes to the community in two aspects:(1) an extendable task design strategy spanning static and dynamic mechanics across multiple difficulty tiers; (2) a 3D Spatial Geometric Computation Library for supporting construction based on language instructions. On nine frontier LLMs, BuildArena comprehensively evaluates their capabilities for language-driven and physics-grounded construction automation.
翻译:工程建造自动化旨在将自然语言规范转化为物理可行的结构,要求在严格物理约束下进行复杂的综合推理。尽管现代大语言模型具备广泛知识与强大推理能力,成为该领域有潜力的候选方案,但其建造能力在很大程度上仍未被评估。为填补这一空白,我们提出BuildArena——首个面向语言驱动工程建造的物理对齐交互式基准测试,标志着迈向基于大语言模型的工程自动化的第一步。在技术层面,该基准测试在两方面做出贡献:(1)一种可扩展的任务设计策略,覆盖从静力学到动力学、跨越多个难度等级的任务;(2)一个支持基于语言指令进行建造的三维空间几何计算库。基于九个前沿大语言模型,BuildArena全面评估了它们在语言驱动与物理约束下实现建造自动化的能力。