Deep neural networks have useful applications in many different tasks, however their performance can be severely affected by changes in the data distribution. For example, in the biomedical field, their performance can be affected by changes in the data (different machines, populations) between training and test datasets. To ensure robustness and generalization to real-world scenarios, test-time adaptation has been recently studied as an approach to adjust models to a new data distribution during inference. Test-time batch normalization is a simple and popular method that achieved compelling performance on domain shift benchmarks. It is implemented by recalculating batch normalization statistics on test batches. Prior work has focused on analysis with test data that has the same label distribution as the training data. However, in many practical applications this technique is vulnerable to label distribution shifts, sometimes producing catastrophic failure. This presents a risk in applying test time adaptation methods in deployment. We propose to tackle this challenge by only selectively adapting channels in a deep network, minimizing drastic adaptation that is sensitive to label shifts. Our selection scheme is based on two principles that we empirically motivate: (1) later layers of networks are more sensitive to label shift (2) individual features can be sensitive to specific classes. We apply the proposed technique to three classification tasks, including CIFAR10-C, Imagenet-C, and diagnosis of fatty liver, where we explore both covariate and label distribution shifts. We find that our method allows to bring the benefits of TTA while significantly reducing the risk of failure common in other methods, while being robust to choice in hyperparameters.
翻译:深度神经网络在许多不同任务中具有重要应用,然而其性能可能因数据分布变化而受到严重影响。例如,在生物医学领域,训练数据集和测试数据集之间的数据差异(如不同设备、群体)可能导致性能下降。为确保对现实场景的鲁棒性和泛化能力,测试时自适应作为一种在推理阶段调整模型以适应新数据分布的方法近期受到研究。测试时批归一化是一种简单且流行的方法,在域偏移基准测试中取得了显著性能,其实现方式是在测试批次上重新计算批归一化统计量。先前研究主要聚焦于测试数据与训练数据具有相同标签分布的场景。然而在实际应用中,该方法易受标签分布偏移影响,有时会导致灾难性失败。这给测试时自适应方法在部署中的应用带来了风险。我们提出通过仅选择性调整深度网络中的部分通道来应对这一挑战,减少对标签偏移敏感的剧烈自适应。我们的选择方案基于两个经实证验证的原则:(1)网络后期层对标签偏移更敏感;(2)单个特征可能对特定类别敏感。我们将所提技术应用于三个分类任务(包括CIFAR10-C、ImageNet-C和脂肪肝诊断),同时探索协变量偏移和标签分布偏移场景。实验发现,本方法既能保留测试时自适应的优势,又能显著降低其他方法常见的失败风险,且对超参数选择具有鲁棒性。