The modeling of industrial scenes is essential for simulations in industrial manufacturing. While large language models (LLMs) have shown significant progress in generating general 3D scenes from textual descriptions, generating industrial scenes with LLMs poses a unique challenge due to their demand for precise measurements and positioning, requiring complex planning over spatial arrangement. To address this challenge, we introduce SceneGenAgent, an LLM-based agent for generating industrial scenes through C# code. SceneGenAgent ensures precise layout planning through a structured and calculable format, layout verification, and iterative refinement to meet the quantitative requirements of industrial scenarios. Experiment results demonstrate that LLMs powered by SceneGenAgent exceed their original performance, reaching up to 81.0% success rate in real-world industrial scene generation tasks and effectively meeting most scene generation requirements. To further enhance accessibility, we construct SceneInstruct, a dataset designed for fine-tuning open-source LLMs to integrate into SceneGenAgent. Experiments show that fine-tuning open-source LLMs on SceneInstruct yields significant performance improvements, with Llama3.1-70B approaching the capabilities of GPT-4o. Our code and data are available at https://github.com/THUDM/SceneGenAgent .
翻译:工业场景建模对于工业制造中的仿真至关重要。尽管大语言模型(LLM)在根据文本描述生成通用三维场景方面已取得显著进展,但生成工业场景对LLM提出了独特挑战,因其需要精确的测量与定位,涉及复杂的空间布局规划。为应对这一挑战,我们提出了SceneGenAgent,一种基于LLM的代理,通过C#代码生成工业场景。SceneGenAgent通过结构化可计算格式、布局验证与迭代优化确保精确的布局规划,以满足工业场景的量化要求。实验结果表明,由SceneGenAgent驱动的LLM超越了其原始性能,在实际工业场景生成任务中成功率最高可达81.0%,并能有效满足大多数场景生成需求。为进一步提升可及性,我们构建了SceneInstruct数据集,专用于微调开源LLM以集成到SceneGenAgent中。实验显示,在SceneInstruct上微调开源LLM可带来显著的性能提升,其中Llama3.1-70B已接近GPT-4o的能力水平。我们的代码与数据公开于https://github.com/THUDM/SceneGenAgent。