Semantic Validation in Structure from Motion

The Structure from Motion (SfM) challenge in computer vision is the process of recovering the 3D structure of a scene from a series of projective measurements that are calculated from a collection of 2D images, taken from different perspectives. SfM consists of three main steps; feature detection and matching, camera motion estimation, and recovery of 3D structure from estimated intrinsic and extrinsic parameters and features. A problem encountered in SfM is that scenes lacking texture or with repetitive features can cause erroneous feature matching between frames. Semantic segmentation offers a route to validate and correct SfM models by labelling pixels in the input images with the use of a deep convolutional neural network. The semantic and geometric properties associated with classes in the scene can be taken advantage of to apply prior constraints to each class of object. The SfM pipeline COLMAP and semantic segmentation pipeline DeepLab were used. This, along with planar reconstruction of the dense model, were used to determine erroneous points that may be occluded from the calculated camera position, given the semantic label, and thus prior constraint of the reconstructed plane. Herein, semantic segmentation is integrated into SfM to apply priors on the 3D point cloud, given the object detection in the 2D input images. Additionally, the semantic labels of matched keypoints are compared and inconsistent semantically labelled points discarded. Furthermore, semantic labels on input images are used for the removal of objects associated with motion in the output SfM models. The proposed approach is evaluated on a data-set of 1102 images of a repetitive architecture scene. This project offers a novel method for improved validation of 3D SfM models.

翻译：运动恢复结构（Structure from Motion, SfM）是计算机视觉中的一项挑战，其过程是从一系列由不同视角采集的二维图像计算得到的投影测量值中，恢复场景的三维结构。SfM包含三个主要步骤：特征检测与匹配、相机运动估计以及从估计的内外参数和特征中恢复三维结构。SfM中遇到的一个问题是，缺乏纹理或具有重复特征的场景可能导致帧间特征匹配错误。语义分割通过利用深度卷积神经网络对输入图像中的像素进行标注，为验证和纠正SfM模型提供了一条途径。可以利用场景中各类别相关的语义和几何属性，对每个对象类别施加先验约束。本研究使用了SfM管线COLMAP和语义分割管线DeepLab。结合稠密模型的平面重建，该方法能够根据语义标签以及重建平面的先验约束，确定那些可能从计算出的相机位置中被遮挡的错误点。在此，语义分割被集成到SfM中，以根据二维输入图像中的目标检测结果，对三维点云施加先验约束。此外，比较了匹配关键点的语义标签，并丢弃了语义标签不一致的点。同时，利用输入图像上的语义标签，在输出SfM模型中移除与运动相关的对象。所提出的方法在一个包含1102张重复建筑场景图像的数据集上进行了评估。本项目为改进三维SfM模型的验证提供了一种新方法。