There is an inescapable long-tailed class-imbalance issue in many real-world classification problems. Current methods for addressing this problem only consider scenarios where all examples come from the same distribution. However, in many cases, there are multiple domains with distinct class imbalance. We study this multi-domain long-tailed learning problem and aim to produce a model that generalizes well across all classes and domains. Towards that goal, we introduce TALLY, a method that addresses this multi-domain long-tailed learning problem. Built upon a proposed selective balanced sampling strategy, TALLY achieves this by mixing the semantic representation of one example with the domain-associated nuisances of another, producing a new representation for use as data augmentation. To improve the disentanglement of semantic representations, TALLY further utilizes a domain-invariant class prototype that averages out domain-specific effects. We evaluate TALLY on several benchmarks and real-world datasets and find that it consistently outperforms other state-of-the-art methods in both subpopulation and domain shift. Our code and data have been released at https://github.com/huaxiuyao/TALLY.
翻译:在众多实际分类问题中,不可避免地存在长尾类别不平衡现象。当前解决该问题的方法仅考虑所有样本来自同一分布的场景。然而在许多情况下,存在多个具有显著类别不平衡的域。我们研究这种多域长尾学习问题,旨在构建一个能跨所有类别和域泛化的模型。为此,我们提出TALLY方法,通过选择性平衡采样策略,将样本的语义表示与另一样本的域相关干扰特征混合,生成新的表示用于数据增强。为提升语义表示的解耦能力,TALLY进一步利用域无关类原型,通过平均消除域特定效应。我们在多个基准和真实数据集上评估TALLY,发现其在子群体偏移和域偏移场景中均持续优于现有最优方法。相关代码和数据已在https://github.com/huaxiuyao/TALLY 开源。