Spatial intelligence is foundational to AI systems that interact with the physical world, particularly in 3D scene generation and spatial comprehension. Current methodologies for 3D scene generation often rely heavily on predefined datasets, and struggle to adapt dynamically to changing spatial relationships. In this paper, we introduce \textbf{GraphCanvas3D}, a programmable, extensible, and adaptable framework for controllable 3D scene generation. Leveraging in-context learning, GraphCanvas3D enables dynamic adaptability without the need for retraining, supporting flexible and customizable scene creation. Our framework employs hierarchical, graph-driven scene descriptions, representing spatial elements as graph nodes and establishing coherent relationships among objects in 3D environments. Unlike conventional approaches, which are constrained in adaptability and often require predefined input masks or retraining for modifications, GraphCanvas3D allows for seamless object manipulation and scene adjustments on the fly. Additionally, GraphCanvas3D supports 4D scene generation, incorporating temporal dynamics to model changes over time. Experimental results and user studies demonstrate that GraphCanvas3D enhances usability, flexibility, and adaptability for scene generation. Our code and models are available on the project website: https://github.com/ILGLJ/Graph-Canvas.
翻译:空间智能是与物理世界交互的AI系统的基础,特别是在三维场景生成和空间理解方面。当前的三维场景生成方法通常严重依赖预定义的数据集,并且难以动态适应变化的空间关系。本文提出\textbf{GraphCanvas3D},一个可编程、可扩展且可适应的可控三维场景生成框架。通过利用上下文学习,GraphCanvas3D实现了无需重新训练的动态适应能力,支持灵活且可定制的场景创建。我们的框架采用分层的、图驱动的场景描述,将空间元素表示为图节点,并在三维环境中建立物体间连贯的关系。与适应性受限、通常需要预定义输入掩码或为修改而重新训练的传统方法不同,GraphCanvas3D允许即时进行无缝的物体操控和场景调整。此外,GraphCanvas3D支持四维场景生成,通过纳入时间动态来建模随时间的变化。实验结果和用户研究表明,GraphCanvas3D在场景生成的可用性、灵活性和适应性方面均有提升。我们的代码和模型已在项目网站发布:https://github.com/ILGLJ/Graph-Canvas。