Test-Time Training (TTT) proposes to adapt a pre-trained network to changing data distributions on-the-fly. In this work, we propose the first TTT method for 3D semantic segmentation, TTT-KD, which models Knowledge Distillation (KD) from foundation models (e.g. DINOv2) as a self-supervised objective for adaptation to distribution shifts at test-time. Given access to paired image-pointcloud (2D-3D) data, we first optimize a 3D segmentation backbone for the main task of semantic segmentation using the pointclouds and the task of 2D $\to$ 3D KD by using an off-the-shelf 2D pre-trained foundation model. At test-time, our TTT-KD updates the 3D segmentation backbone for each test sample, by using the self-supervised task of knowledge distillation, before performing the final prediction. Extensive evaluations on multiple indoor and outdoor 3D segmentation benchmarks show the utility of TTT-KD, as it improves performance for both in-distribution (ID) and out-of-distribution (ODO) test datasets. We achieve a gain of up to 13% mIoU (7% on average) when the train and test distributions are similar and up to 45% (20% on average) when adapting to OOD test samples.
翻译:测试时训练(TTT)提出使预训练网络能够实时适应变化的数据分布。本文首次提出针对3D语义分割的TTT方法——TTT-KD,该方法将基础模型(如DINOv2)的知识蒸馏建模为自监督目标,用于在测试时适应分布偏移。给定配对的图像-点云(2D-3D)数据,我们首先利用点云优化3D分割主干网络以完成语义分割主任务,并通过现成的2D预训练基础模型实现2D→3D知识蒸馏任务。在测试阶段,TTT-KD对每个测试样本,在执行最终预测前通过知识蒸馏这一自监督任务更新3D分割主干网络。在多个室内外3D分割基准上的大量实验表明,TTT-KD能提升分布内(ID)和分布外(ODO)测试数据集的性能。当训练集与测试集分布相似时,平均交并比(mIoU)提升可达13%(平均7%);当适应OOD测试样本时,提升可达45%(平均20%)。