Estimating the layout of a room from a single-shot panoramic image is important in virtual/augmented reality and furniture layout simulation. This involves identifying three-dimensional (3D) geometry, such as the location of corners and boundaries, and performing 3D reconstruction. However, occlusion is a common issue that can negatively impact room layout estimation, and this has not been thoroughly studied to date. It is possible to obtain 3D shape information of rooms as drawings of buildings and coordinates of corners from image datasets, thus we propose providing both 2D panoramic and 3D information to a model to effectively deal with occlusion. However, simply feeding 3D information to a model is not sufficient to utilize the shape information for an occluded area. Therefore, we improve the model by introducing 3D Intersection over Union (IoU) loss to effectively use 3D information. In some cases, drawings are not available or the construction deviates from a drawing. Considering such practical cases, we propose a method for distilling knowledge from a model trained with both images and 3D information to a model that takes only images as input. The proposed model, which is called Shape-Net, achieves state-of-the-art (SOTA) performance on benchmark datasets. We also confirmed its effectiveness in dealing with occlusion through significantly improved accuracy on images with occlusion compared with existing models.
翻译:从单次拍摄的全景图像中估计室内布局在虚拟现实/增强现实及家具布局模拟中具有重要意义。这一过程涉及识别三维几何特征(如角落位置和边界)并执行三维重建。然而,遮挡是影响室内布局估计的常见问题,至今尚未得到深入系统研究。由于可以从图像数据集中获取房间的建筑物图纸和角落坐标等三维形状信息,我们提出同时向模型提供二维全景信息和三维信息以有效处理遮挡问题。然而,仅向模型输入三维信息不足以充分利用遮挡区域的形状特征。因此,我们通过引入三维交并比损失函数来改进模型,从而有效利用三维信息。在实际场景中,有时无法获取图纸或实际建筑结构与图纸存在偏差。针对此类情况,我们提出一种知识蒸馏方法:将同时使用图像和三维信息训练的模型知识迁移至仅以图像为输入的模型。所提出的Shape-Net模型在基准数据集上达到了最先进的性能。通过显著提升遮挡图像的估计精度,我们证实了该模型在处理遮挡问题上的有效性,其性能优于现有模型。