For augmented reality (AR), it is important that virtual assets appear to `sit among' real world objects. The virtual element should variously occlude and be occluded by real matter, based on a plausible depth ordering. This occlusion should be consistent over time as the viewer's camera moves. Unfortunately, small mistakes in the estimated scene depth can ruin the downstream occlusion mask, and thereby the AR illusion. Especially in real-time settings, depths inferred near boundaries or across time can be inconsistent. In this paper, we challenge the need for depth-regression as an intermediate step. We instead propose an implicit model for depth and use that to predict the occlusion mask directly. The inputs to our network are one or more color images, plus the known depths of any virtual geometry. We show how our occlusion predictions are more accurate and more temporally stable than predictions derived from traditional depth-estimation models. We obtain state-of-the-art occlusion results on the challenging ScanNetv2 dataset and superior qualitative results on real scenes.
翻译:对于增强现实(AR)而言,使虚拟物体看起来“嵌入”现实世界物体之中至关重要。基于合理的深度排序,虚拟元素应当能够根据实际情况遮挡或被现实物体遮挡。这种遮挡效应需随观察者摄像头的移动而保持时间一致性。遗憾的是,场景深度估计中的微小误差会破坏后续的遮挡掩膜,进而破坏AR的沉浸感。特别是在实时场景中,边界附近或跨时间推断的深度往往存在不一致性。本文质疑了将深度回归作为中间步骤的必要性,转而提出一种隐式深度模型,并直接利用该模型预测遮挡掩膜。我们网络的输入为一张或多张彩色图像,以及已知的任意虚拟几何体深度。实验表明,相比基于传统深度估计模型推导出的预测结果,我们的遮挡预测具有更高的精度和更强的时间稳定性。在极具挑战性的ScanNetv2数据集上,我们取得了最先进的遮挡效果,并在真实场景中获得了卓越的定性结果。