Current 3D scene graph generation (3DSGG) approaches heavily rely on a single-agent assumption and small-scale environments, exhibiting limited scalability to real-world scenarios. In this work, we introduce Multi-Agent 3D Scene Graph Generation (MA3DSG) model, the first framework designed to tackle this scalability challenge using multiple agents. We develop a training-free graph alignment algorithm that efficiently merges partial query graphs from individual agents into a unified global scene graph. Leveraging extensive analysis and empirical insights, our approach enables conventional single-agent systems to operate collaboratively without requiring any learnable parameters. To rigorously evaluate 3DSGG performance, we propose MA3DSG-Bench-a benchmark that supports diverse agent configurations, domain sizes, and environmental conditions-providing a more general and extensible evaluation framework. This work lays a solid foundation for scalable, multi-agent 3DSGG research.
翻译:当前的三维场景图生成方法严重依赖单智能体假设与小规模环境,在扩展到真实世界场景时表现出明显的局限性。本研究提出了多智能体三维场景图生成模型,这是首个利用多智能体应对可扩展性挑战的框架。我们开发了一种无需训练的图对齐算法,能够高效地将各智能体的局部查询图融合为统一的全局场景图。基于深入的理论分析与实证研究,我们的方法使传统单智能体系统能够以协同方式运行,且无需任何可学习参数。为严格评估三维场景图生成性能,我们提出了MA3DSG-Bench基准测试平台——该平台支持多样化的智能体配置、领域规模及环境条件,提供了一个更通用且可扩展的评估框架。本研究为可扩展的多智能体三维场景图生成研究奠定了坚实基础。