Reconstructing urban areas in 3D out of satellite raster images has been a long-standing and challenging goal of both academical and industrial research. The rare methods today achieving this objective at a Level Of Details $2$ rely on procedural approaches based on geometry, and need stereo images and/or LIDAR data as input. We here propose a method for urban 3D reconstruction named KIBS(\textit{Keypoints Inference By Segmentation}), which comprises two novel features: i) a full deep learning approach for the 3D detection of the roof sections, and ii) only one single (non-orthogonal) satellite raster image as model input. This is achieved in two steps: i) by a Mask R-CNN model performing a 2D segmentation of the buildings' roof sections, and after blending these latter segmented pixels within the RGB satellite raster image, ii) by another identical Mask R-CNN model inferring the heights-to-ground of the roof sections' corners via panoptic segmentation, unto full 3D reconstruction of the buildings and city. We demonstrate the potential of the KIBS method by reconstructing different urban areas in a few minutes, with a Jaccard index for the 2D segmentation of individual roof sections of $88.55\%$ and $75.21\%$ on our two data sets resp., and a height's mean error of such correctly segmented pixels for the 3D reconstruction of $1.60$ m and $2.06$ m on our two data sets resp., hence within the LOD2 precision range.
翻译:从卫星栅格图像中重建城市三维结构一直是学术界和工业界长期且具有挑战性的目标。目前,极少数能够达到LOD2(Level of Details 2)精度的实现方法依赖于基于几何的过程化方法,并需要立体图像和/或LIDAR数据作为输入。本文提出了一种名为KIBS(Keypoints Inference By Segmentation)的城市三维重建方法,其包含两项创新特性:(i) 一种用于屋顶分段三维检测的全深度学习方法;(ii) 仅需单张(非正交)卫星栅格图像作为模型输入。该方法通过两步实现:首先,利用Mask R-CNN模型对建筑物屋顶分段进行二维分割;随后,将分割后的像素与RGB卫星栅格图像融合后,通过另一个相同的Mask R-CNN模型,利用全景分割推断屋顶分段角点距地面的高度,进而实现建筑物及城市的完整三维重建。我们通过数分钟重建多个城市区域展示了KIBS方法的潜力:在两个数据集上,单个屋顶分段二维分割的Jaccard指数分别达到88.55%和75.21%;对于正确分割像素的三维重建高度平均误差分别为1.60米和2.06米,因此该方法达到LOD2精度范围。