In this work we present FreDSNet, a deep learning solution which obtains semantic 3D understanding of indoor environments from single panoramas. Omnidirectional images reveal task-specific advantages when addressing scene understanding problems due to the 360-degree contextual information about the entire environment they provide. However, the inherent characteristics of the omnidirectional images add additional problems to obtain an accurate detection and segmentation of objects or a good depth estimation. To overcome these problems, we exploit convolutions in the frequential domain obtaining a wider receptive field in each convolutional layer. These convolutions allow to leverage the whole context information from omnidirectional images. FreDSNet is the first network that jointly provides monocular depth estimation and semantic segmentation from a single panoramic image exploiting fast Fourier convolutions. Our experiments show that FreDSNet has similar performance as specific state of the art methods for semantic segmentation and depth estimation. FreDSNet code is publicly available in https://github.com/Sbrunoberenguel/FreDSNet
翻译:本文提出FreDSNet,一种从单张全景图像中获取室内环境语义三维理解的深度学习方案。由于全向图像能够提供关于整个环境的360度上下文信息,因此在解决场景理解问题时展现出任务特定的优势。然而,全向图像的固有特性为准确检测和分割物体或获得良好深度估计带来了额外挑战。为克服这些问题,我们利用频域卷积在每一卷积层获得更广的感受野。此类卷积能够充分利用全向图像中的全局上下文信息。FreDSNet是首个通过快速傅里叶卷积从单张全景图像中联合实现单目深度估计与语义分割的网络。实验表明,FreDSNet在语义分割与深度估计任务上具有与特定前沿方法相当的性能。FreDSNet代码已在https://github.com/Sbrunoberenguel/FreDSNet 公开发布。