Spatial intelligence is foundational to AI systems that interact with the physical world, particularly in 3D scene generation and spatial comprehension. Current methodologies for 3D scene generation often rely heavily on predefined datasets, and struggle to adapt dynamically to changing spatial relationships. In this paper, we introduce GraphCanvas3D, a programmable, extensible, and adaptable framework for controllable 3D scene generation. Leveraging in-context learning, GraphCanvas3D enables dynamic adaptability without the need for retraining, supporting flexible and customizable scene creation. Our framework employs hierarchical, graph-driven scene descriptions, representing spatial elements as graph nodes and establishing coherent relationships among objects in 3D environments. Unlike conventional approaches, which are constrained in adaptability and often require predefined input masks or retraining for modifications, GraphCanvas3D allows for seamless object manipulation and scene adjustments on the fly. Additionally, GraphCanvas3D supports 4D scene generation, incorporating temporal dynamics to model changes over time. Experimental results and user studies demonstrate that GraphCanvas3D enhances usability, flexibility, and adaptability for scene generation. Our code and models are available on the project website: https://github.com/ILGLJ/Graph-Canvas.
翻译:空间智能是与物理世界交互的人工智能系统的基础,尤其在三维场景生成与空间理解领域。当前的三维场景生成方法通常严重依赖预定义数据集,难以动态适应变化的空间关系。本文提出GraphCanvas3D——一个可编程、可扩展且可适配的可控三维场景生成框架。通过利用上下文学习,GraphCanvas3D无需重新训练即可实现动态适应能力,支持灵活可定制的场景创建。本框架采用分层化、图驱动的场景描述方法,将空间元素表示为图节点,并在三维环境中建立物体间的连贯关系。与传统方法(其适应性受限,且常需预定义输入掩码或通过重新训练实现修改)不同,GraphCanvas3D支持实时无缝的对象操控与场景调整。此外,GraphCanvas3D还支持四维场景生成,通过纳入时间动态来建模随时间的变化。实验结果与用户研究表明,GraphCanvas3D显著提升了场景生成的可用性、灵活性与适应性。我们的代码与模型已在项目网站发布:https://github.com/ILGLJ/Graph-Canvas。