Un-Mixing Test-Time Normalization Statistics: Combatting Label Temporal Correlation

In an era where test-time adaptation methods increasingly rely on the nuanced manipulation of batch normalization (BN) parameters, one critical assumption often goes overlooked: that of independently and identically distributed (i.i.d.) test batches with respect to unknown labels. This assumption culminates in biased estimates of BN statistics and jeopardizes system stability under non-i.i.d. conditions. This paper pioneers a departure from the i.i.d. paradigm by introducing a groundbreaking strategy termed "Un-Mixing Test-Time Normalization Statistics" (UnMix-TNS). UnMix-TNS re-calibrates the instance-wise statistics used to normalize each instance in a batch by mixing it with multiple unmixed statistics components, thus inherently simulating the i.i.d. environment. The key lies in our innovative online unmixing procedure, which persistently refines these statistics components by drawing upon the closest instances from an incoming test batch. Remarkably generic in its design, UnMix-TNS seamlessly integrates with an array of state-of-the-art test-time adaptation methods and pre-trained architectures equipped with BN layers. Empirical evaluations corroborate the robustness of UnMix-TNS under varied scenarios ranging from single to continual and mixed domain shifts. UnMix-TNS stands out when handling test data streams with temporal correlation, including those with corrupted real-world non-i.i.d. streams, sustaining its efficacy even with minimal batch sizes and individual samples. Our results set a new standard for test-time adaptation, demonstrating significant improvements in both stability and performance across multiple benchmarks.

翻译：在测试时自适应方法日益依赖对批归一化（BN）参数的精细操作的时代，一个关键假设常被忽视：即测试批次在未知标签方面具有独立同分布（i.i.d.）特性。这一假设会导致对BN统计量的有偏估计，并在非i.i.d.条件下危及系统稳定性。本文率先突破i.i.d.范式，引入一项开创性策略，称为“解混测试时归一化统计量”（UnMix-TNS）。UnMix-TNS通过将每个实例的归一化统计量与多个解混统计分量混合，重新校准用于归一化批次中各实例的实例级统计量，从而内在模拟i.i.d.环境。其核心在于我们创新的在线解混过程，该过程通过从传入测试批次中提取最接近的实例，持续优化这些统计分量。UnMix-TNS设计具有显著的通用性，可无缝集成多种配备BN层的最先进测试时自适应方法和预训练架构。实证评估证实了UnMix-TNS在从单一域到连续域及混合域偏移等多种场景下的鲁棒性。UnMix-TNS在处理存在时间相关性的测试数据流（包括那些被破坏的真实世界非i.i.d.数据流）时表现突出，即便在极小批次大小和单个样本的情况下也能保持有效性。我们的结果为测试时自适应设立了新标准，在多个基准测试中展现出显著提升的稳定性与性能。