Memory-based Adapters for Online 3D Scene Perception

In this paper, we propose a new framework for online 3D scene perception. Conventional 3D scene perception methods are offline, i.e., take an already reconstructed 3D scene geometry as input, which is not applicable in robotic applications where the input data is streaming RGB-D videos rather than a complete 3D scene reconstructed from pre-collected RGB-D videos. To deal with online 3D scene perception tasks where data collection and perception should be performed simultaneously, the model should be able to process 3D scenes frame by frame and make use of the temporal information. To this end, we propose an adapter-based plug-and-play module for the backbone of 3D scene perception model, which constructs memory to cache and aggregate the extracted RGB-D features to empower offline models with temporal learning ability. Specifically, we propose a queued memory mechanism to cache the supporting point cloud and image features. Then we devise aggregation modules which directly perform on the memory and pass temporal information to current frame. We further propose 3D-to-2D adapter to enhance image features with strong global context. Our adapters can be easily inserted into mainstream offline architectures of different tasks and significantly boost their performance on online tasks. Extensive experiments on ScanNet and SceneNN datasets demonstrate our approach achieves leading performance on three 3D scene perception tasks compared with state-of-the-art online methods by simply finetuning existing offline models, without any model and task-specific designs. \href{https://xuxw98.github.io/Online3D/}{Project page}.

翻译：本文提出了一种面向在线三维场景感知的新框架。传统三维场景感知方法属于离线处理，即输入已完成重建的三维场景几何信息，但这不适用于机器人应用场景——其中输入数据是流式RGB-D视频，而非预先采集的RGB-D视频重建的完整三维场景。为应对数据采集与感知需同步进行的在线三维场景感知任务，模型需具备逐帧处理三维场景并利用时序信息的能力。为此，我们为三维场景感知模型的骨干网络设计了一种基于适配器的即插即用模块，该模块通过构建记忆缓存来聚合提取的RGB-D特征，赋予离线模型时序学习能力。具体而言，我们提出队列记忆机制来缓存支撑点云与图像特征，并设计直接在记忆上进行操作的聚合模块，将时序信息传递至当前帧。进一步提出3D-to-2D适配器以增强图像特征的全局上下文信息。我们的适配器可便捷插入不同任务的现有离线架构中，显著提升其在在线任务上的性能。在ScanNet和SceneNN数据集上的大量实验表明，仅需微调现有离线模型而无需任何模型与任务特化设计，我们的方法即在三项三维场景感知任务中达到与最先进在线方法相匹敌的领先水平。项目主页：\href{https://xuxw98.github.io/Online3D/}{Project page}