Segmenting unseen objects is a crucial ability for the robot since it may encounter new environments during the operation. Recently, a popular solution is leveraging RGB-D features of large-scale synthetic data and directly applying the model to unseen real-world scenarios. However, the domain shift caused by the sim2real gap is inevitable, posing a crucial challenge to the segmentation model. In this paper, we emphasize the adaptation process across sim2real domains and model it as a learning problem on the BatchNorm parameters of a simulation-trained model. Specifically, we propose a novel non-parametric entropy objective, which formulates the learning objective for the test-time adaptation in an open-world manner. Then, a cross-modality knowledge distillation objective is further designed to encourage the test-time knowledge transfer for feature enhancement. Our approach can be efficiently implemented with only test images, without requiring annotations or revisiting the large-scale synthetic training data. Besides significant time savings, the proposed method consistently improves segmentation results on the overlap and boundary metrics, achieving state-of-the-art performance on unseen object instance segmentation.
翻译:分割未见过物体是机器人的关键能力,因为其在操作过程中可能遇到新环境。近期,一种流行的方法是使用大规模合成数据的RGB-D特征,并将模型直接应用于未见的真实场景。然而,由仿真到现实差距导致的领域偏移不可避免,给分割模型带来了严峻挑战。本文聚焦仿真-现实领域的自适应过程,并将其建模为对经仿真训练模型批归一化参数的学习问题。具体而言,我们提出一种新型非参数熵目标函数,以开放式方式构建测试时自适应的学习目标。进一步设计跨模态知识蒸馏目标,驱动测试时知识迁移以实现特征增强。该方法仅需测试图像即可高效实现,无需标注或重新访问大规模合成训练数据。除显著节省时间外,所提方法在重叠度与边界度量指标上持续改善分割结果,在未见过物体实例分割任务上达到当前最优性能。