RLSbench: Domain Adaptation Under Relaxed Label Shift

Despite the emergence of principled methods for domain adaptation under label shift, the sensitivity of these methods for minor shifts in the class conditional distributions remains precariously under explored. Meanwhile, popular deep domain adaptation heuristics tend to falter when faced with shifts in label proportions. While several papers attempt to adapt these heuristics to accommodate shifts in label proportions, inconsistencies in evaluation criteria, datasets, and baselines, make it hard to assess the state of the art. In this paper, we introduce RLSbench, a large-scale relaxed label shift benchmark, consisting of >500 distribution shift pairs that draw on 14 datasets across vision, tabular, and language modalities and compose them with varying label proportions. First, we evaluate 13 popular domain adaptation methods, demonstrating more widespread failures under label proportion shifts than were previously known. Next, we develop an effective two-step meta-algorithm that is compatible with most deep domain adaptation heuristics: (i) pseudo-balance the data at each epoch; and (ii) adjust the final classifier with (an estimate of) target label distribution. The meta-algorithm improves existing domain adaptation heuristics often by 2--10\% accuracy points under extreme label proportion shifts and has little (i.e., <0.5\%) effect when label proportions do not shift. We hope that these findings and the availability of RLSbench will encourage researchers to rigorously evaluate proposed methods in relaxed label shift settings. Code is publicly available at https://github.com/acmi-lab/RLSbench.

翻译：尽管在标签偏移下领域自适应的原则性方法已经出现，但这些方法对类别条件分布中微小偏移的敏感性仍然未被充分探索。与此同时，流行的深度领域自适应启发式方法在面对标签比例偏移时往往表现不佳。尽管有若干论文尝试调整这些启发式方法以适应标签比例偏移，但评估标准、数据集和基线的差异性使得评估现有技术水平变得困难。本文提出了RLSbench，一个大规模的松弛标签偏移基准，包含超过500个分布偏移对，这些偏移对基于涵盖视觉、表格和语言模态的14个数据集，并配以不同的标签比例。首先，我们评估了13种流行的领域自适应方法，揭示了在标签比例偏移下比以往已知更广泛的失败案例。其次，我们开发了一种有效的两步元算法，该算法能与大多数深度领域自适应启发式方法兼容：(i) 在每个训练周期对数据进行伪平衡；(ii) 利用目标标签分布的估计值调整最终分类器。该元算法在极端标签比例偏移下，提升了现有领域自适应启发式方法的准确率，通常提高2-10个百分点，并且在标签比例未偏移时影响极小（即<0.5%）。我们希望这些发现以及RLSbench的可用性，能鼓励研究人员在松弛标签偏移设置中严格评估所提出的方法。代码已公开在 https://github.com/acmi-lab/RLSbench。