Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clustering Regularized Self-Training

Deploying models on target domain data subject to distribution shift requires adaptation. Test-time training (TTT) emerges as a solution to this adaptation under a realistic scenario where access to full source domain data is not available, and instant inference on the target domain is required. Despite many efforts into TTT, there is a confusion over the experimental settings, thus leading to unfair comparisons. In this work, we first revisit TTT assumptions and categorize TTT protocols by two key factors. Among the multiple protocols, we adopt a realistic sequential test-time training (sTTT) protocol, under which we develop a test-time anchored clustering (TTAC) approach to enable stronger test-time feature learning. TTAC discovers clusters in both source and target domains and matches the target clusters to the source ones to improve adaptation. When source domain information is strictly absent (i.e. source-free) we further develop an efficient method to infer source domain distributions for anchored clustering. Finally, self-training~(ST) has demonstrated great success in learning from unlabeled data and we empirically figure out that applying ST alone to TTT is prone to confirmation bias. Therefore, a more effective TTT approach is introduced by regularizing self-training with anchored clustering, and the improved model is referred to as TTAC++. We demonstrate that, under all TTT protocols, TTAC++ consistently outperforms the state-of-the-art methods on five TTT datasets, including corrupted target domain, selected hard samples, synthetic-to-real adaptation and adversarially attacked target domain. We hope this work will provide a fair benchmarking of TTT methods, and future research should be compared within respective protocols.

翻译：将模型部署在面临分布偏移的目标域数据上需要自适应能力。测试时训练（TTT）作为一种解决方案，适用于无法获取完整源域数据且需对目标域进行即时推理的现实场景。尽管已有大量关于TTT的研究，但在实验设置上仍存在混淆，导致不公平比较。本文首先重新审视TTT的假设条件，并基于两个关键因素对TTT协议进行分类。在多种协议中，我们采用一种现实的序贯测试时训练（sTTT）协议，并在此基础上提出测试时锚定聚类（TTAC）方法，以增强测试时的特征学习能力。TTAC在源域和目标域中同时发现聚类结构，并通过将目标聚类与源聚类进行匹配来改善自适应效果。当源域信息完全不可用时（即无源场景），我们进一步开发了一种高效方法以推断源域分布用于锚定聚类。最后，自训练（ST）在从无标签数据中学习方面已展现出显著成功，但我们通过实证发现，将ST单独应用于TTT容易产生确认偏差。因此，我们引入一种更有效的TTT方法——通过锚定聚类正则化自训练，改进后的模型称为TTAC++。我们证明，在所有TTT协议下，TTAC++在五个TTT数据集（包括受损目标域、精选困难样本、合成到真实的自适应以及遭受对抗攻击的目标域）上均持续优于现有最优方法。希望本工作能为TTT方法提供公平的基准测试，且未来研究应在各自协议框架内进行比较。