Utilizing functional elements in an industrial environment, such as displays and interactive valves, provide effective possibilities for robot training. When preparing simulations for robots or applications that involve high-level scene understanding, the simulation environment must be equally detailed. Although CAD files for such environments deliver an exact description of the geometry and visuals, they usually lack semantic, relational and functional information, thus limiting the simulation and training possibilities. A 3D scene graph can organize semantic, spatial and functional information by enriching the environment through a Large Vision-Language Model (LVLM). In this paper we present an offline approach to creating detailed 3D scene graphs from CAD environments. This will serve as a foundation to include the relations of functional and actionable elements, which then can be used for dynamic simulation and reasoning. Key results of this research include both quantitative results of the generated semantic labels as well as qualitative results of the scene graph, especially in hindsight of pipe structures and identified functional relations. All code, results and the environment will be made available at https://cad-scenegraph.github.io
翻译:利用工业环境中的功能元件(如显示屏和交互式阀门)为机器人训练提供了有效可能性。在准备涉及高层场景理解的机器人仿真或应用时,仿真环境必须具备同等精细度。尽管此类环境的CAD文件能精确描述几何结构与视觉外观,但通常缺乏语义、关系和功能信息,从而限制了仿真与训练的可能性。通过大型视觉语言模型对环境进行增强,三维场景图能够组织语义、空间及功能信息。本文提出一种从CAD环境创建精细三维场景图的离线方法。这将作为纳入功能性与可操作元件关系的基础,进而用于动态仿真与推理。本研究的关键成果包括生成语义标签的定量结果,以及场景图的定性分析结果,特别是在管道结构和已识别功能关系方面的后验评估。所有代码、结果及环境数据将在https://cad-scenegraph.github.io公开。