This paper presents a neural network based semantic plane detection method utilizing polygon representations. The method can for example be used to solve room layout estimations tasks. The method is built on, combines and further develops several different modules from previous research. The network takes an RGB image and estimates a wireframe as well as a feature space using an hourglass backbone. From these, line and junction features are sampled. The lines and junctions are then represented as an undirected graph, from which polygon representations of the sought planes are obtained. Two different methods for this last step are investigated, where the most promising method is built on a heterogeneous graph transformer. The final output is in all cases a projection of the semantic planes in 2D. The methods are evaluated on the Structured 3D dataset and we investigate the performance both using sampled and estimated wireframes. The experiments show the potential of the graph-based method by outperforming state of the art methods in Room Layout estimation in the 2D metrics using synthetic wireframe detections.
翻译:本文提出了一种基于神经网络的多边形语义平面检测方法。该方法可应用于例如解决房间布局估计任务。它集成、融合并进一步发展了以往研究中多个不同模块。网络以RGB图像为输入,利用沙漏骨干网络估计线框及特征空间,并从中采样线段和交点特征。随后,这些线段和交点被构建为一个无向图,进而通过该图获得目标平面的多边形表示。针对最后一步,本文研究了两种不同方法,其中最具前景的方法基于异构图形变换器。所有情况下,最终输出均为二维平面语义投影。方法在Structured 3D数据集上进行了评估,我们同时研究了使用采样线框和估计线框的性能表现。实验表明,基于图形的方法在二维指标上使用合成线框检测时优于当前最先进的房间布局估计方法,展现了其潜力。