This work proposes a novel approach that uses a semantic segmentation mask to obtain a 2D spatial layout of the segmentation-categories across the scene, designated by segmentation-based semantic features (SSFs). These features represent, per segmentation-category, the pixel count, as well as the 2D average position and respective standard deviation values. Moreover, a two-branch network, GS2F2App, that exploits CNN-based global features extracted from RGB images and the segmentation-based features extracted from the proposed SSFs, is also proposed. GS2F2App was evaluated in two indoor scene benchmark datasets: the SUN RGB-D and the NYU Depth V2, achieving state-of-the-art results on both datasets.
翻译:本文提出一种新颖方法,利用语义分割掩码获取场景中分割类别的二维空间布局,并定义为基于分割的语义特征(SSFs)。这些特征分别表征每个分割类别的像素数量、二维平均位置及其对应的标准差。此外,本文还提出一种双分支网络GS2F2App,该网络同时利用从RGB图像中提取的基于CNN的全局特征和从所提出的SSFs中提取的分割特征。GS2F2App在两个室内场景基准数据集(SUN RGB-D和NYU Depth V2)上进行了评估,在两个数据集上均取得了最先进的结果。