Deep neural networks have useful applications in many different tasks, however their performance can be severely affected by changes in the data distribution. For example, in the biomedical field, their performance can be affected by changes in the data (different machines, populations) between training and test datasets. To ensure robustness and generalization to real-world scenarios, test-time adaptation has been recently studied as an approach to adjust models to a new data distribution during inference. Test-time batch normalization is a simple and popular method that achieved compelling performance on domain shift benchmarks. It is implemented by recalculating batch normalization statistics on test batches. Prior work has focused on analysis with test data that has the same label distribution as the training data. However, in many practical applications this technique is vulnerable to label distribution shifts, sometimes producing catastrophic failure. This presents a risk in applying test time adaptation methods in deployment. We propose to tackle this challenge by only selectively adapting channels in a deep network, minimizing drastic adaptation that is sensitive to label shifts. Our selection scheme is based on two principles that we empirically motivate: (1) later layers of networks are more sensitive to label shift (2) individual features can be sensitive to specific classes. We apply the proposed technique to three classification tasks, including CIFAR10-C, Imagenet-C, and diagnosis of fatty liver, where we explore both covariate and label distribution shifts. We find that our method allows to bring the benefits of TTA while significantly reducing the risk of failure common in other methods, while being robust to choice in hyperparameters.
翻译:深度神经网络在众多任务中具有广泛应用,但其性能易受数据分布变化的影响。例如在生物医学领域,训练集与测试集之间数据特征的差异(如不同设备、人群)可能显著影响模型性能。为确保模型在实际场景中的鲁棒性与泛化能力,测试时自适应作为一种在推理阶段调整模型以适应新数据分布的方法近期受到广泛关注。测试时批量归一化通过重新计算测试批次中的批量归一化统计量,以其简洁性在领域偏移基准测试中取得了显著效果。现有研究主要集中于训练数据与测试数据标签分布一致的场景分析。然而,在实际应用中,该方法对标签分布偏移极为敏感,甚至可能导致灾难性失效,这给测试时自适应方法的实际部署带来了风险。为应对这一挑战,我们提出仅对深度网络中的通道进行选择性自适应,以最小化对标签偏移敏感的剧烈调整。我们的选择方案基于两个通过实证验证的原则:(1)网络深层对标签偏移更敏感;(2)个体特征可能对特定类别敏感。我们将所提技术应用于三个分类任务——CIFAR10-C、Imagenet-C及脂肪肝诊断,同时探究协变量偏移与标签分布偏移场景。实验表明,本方法在保留测试时自适应优势的同时,能显著降低其他方法常见的失效风险,且对超参数选择具有鲁棒性。