An increasing amount of applications rely on data-driven models that are deployed for perception tasks across a sequence of scenes. Due to the mismatch between training and deployment data, adapting the model on the new scenes is often crucial to obtain good performance. In this work, we study continual multi-scene adaptation for the task of semantic segmentation, assuming that no ground-truth labels are available during deployment and that performance on the previous scenes should be maintained. We propose training a Semantic-NeRF network for each scene by fusing the predictions of a segmentation model and then using the view-consistent rendered semantic labels as pseudo-labels to adapt the model. Through joint training with the segmentation model, the Semantic-NeRF model effectively enables 2D-3D knowledge transfer. Furthermore, due to its compact size, it can be stored in a long-term memory and subsequently used to render data from arbitrary viewpoints to reduce forgetting. We evaluate our approach on ScanNet, where we outperform both a voxel-based baseline and a state-of-the-art unsupervised domain adaptation method.
翻译:越来越多应用依赖于在连续场景序列中部署数据驱动模型进行感知任务。由于训练数据与部署数据存在分布差异,对模型在新场景中进行适应对于获得良好性能往往至关重要。本文研究语义分割任务的连续多场景适应问题,假设部署期间无真实标签可用,且需保持对先前场景的性能。我们提出为每个场景训练语义NeRF网络,通过融合分割模型的预测结果,利用视角一致的渲染语义标签作为伪标签来适应模型。通过与分割模型的联合训练,语义NeRF模型有效实现了2D-3D知识迁移。此外,由于其紧凑的模型尺寸,该网络可存储于长期记忆中,并用于从任意视角渲染数据以减少遗忘。我们在ScanNet数据集上的评估表明,本方法优于基于体素的基线方法及当前最先进的无监督域适应方法。