RLSbench: Domain Adaptation Under Relaxed Label Shift

Despite the emergence of principled methods for domain adaptation under label shift, their sensitivity to shifts in class conditional distributions is precariously under explored. Meanwhile, popular deep domain adaptation heuristics tend to falter when faced with label proportions shifts. While several papers modify these heuristics in attempts to handle label proportions shifts, inconsistencies in evaluation standards, datasets, and baselines make it difficult to gauge the current best practices. In this paper, we introduce RLSbench, a large-scale benchmark for relaxed label shift, consisting of $>$500 distribution shift pairs spanning vision, tabular, and language modalities, with varying label proportions. Unlike existing benchmarks, which primarily focus on shifts in class-conditional $p(x|y)$, our benchmark also focuses on label marginal shifts. First, we assess 13 popular domain adaptation methods, demonstrating more widespread failures under label proportion shifts than were previously known. Next, we develop an effective two-step meta-algorithm that is compatible with most domain adaptation heuristics: (i) pseudo-balance the data at each epoch; and (ii) adjust the final classifier with target label distribution estimate. The meta-algorithm improves existing domain adaptation heuristics under large label proportion shifts, often by 2--10\% accuracy points, while conferring minimal effect ($<$0.5\%) when label proportions do not shift. We hope that these findings and the availability of RLSbench will encourage researchers to rigorously evaluate proposed methods in relaxed label shift settings. Code is publicly available at https://github.com/acmi-lab/RLSbench.

翻译：尽管针对标签偏移下的域自适应已涌现出若干原理性方法，但这些方法在类条件分布偏移下的敏感性却鲜有充分探索。与此同时，主流深度域自适应启发式算法在面临标签比例偏移时往往表现不佳。虽有多篇论文尝试改进这些启发式算法以处理标签比例偏移，但因评估标准、数据集和基线的不一致，难以确定当前最佳实践。本文提出RLSbench——一个面向松弛标签偏移的大规模基准测试，包含涵盖视觉、表格和语言模态的500余种分布偏移对，并具有多样化的标签比例。与现有主要关注类条件p(x|y)偏移的基准不同，本基准同时聚焦标签边缘偏移。首先，我们评估了13种主流域自适应方法，揭示出标签比例偏移下比先前认知更普遍的失效现象。其次，我们开发了一个兼容多数域自适应启发式算法的有效两步元算法：（i）每轮迭代对数据进行伪平衡；（ii）通过目标标签分布估计调整最终分类器。该元算法在标签比例大幅偏移时可将现有域自适应启发式算法的准确率提升2-10个百分点，而在标签比例未偏移时仅产生极小影响（<0.5%）。我们期待这些发现与RLSbench的公开能够激励研究者在松弛标签偏移场景下严格评估所提方法。代码已开源发布于https://github.com/acmi-lab/RLSbench。