There is an inescapable long-tailed class-imbalance issue in many real-world classification problems. Existing long-tailed classification methods focus on the single-domain setting, where all examples are drawn from the same distribution. However, real-world scenarios often involve multiple domains with distinct imbalanced class distributions. We study this multi-domain long-tailed learning problem and aim to produce a model that generalizes well across all classes and domains. Towards that goal, we introduce TALLY, which produces invariant predictors by balanced augmenting hidden representations over domains and classes. Built upon a proposed selective balanced sampling strategy, TALLY achieves this by mixing the semantic representation of one example with the domain-associated nuisances of another, producing a new representation for use as data augmentation. To improve the disentanglement of semantic representations, TALLY further utilizes a domain-invariant class prototype that averages out domain-specific effects. We evaluate TALLY on four long-tailed variants of classical domain generalization benchmarks and two real-world imbalanced multi-domain datasets. The results indicate that TALLY consistently outperforms other state-of-the-art methods in both subpopulation shift and domain shift.
翻译:在许多现实分类问题中,长尾类别不平衡难以避免。现有长尾分类方法主要关注单领域场景,即所有样本来自同一分布。然而,现实场景往往涉及多个领域,且各领域类别分布极不平衡。本文研究多领域长尾学习问题,旨在构建一个能够跨所有类别和领域良好泛化的模型。为此,我们提出 TALLY 方法,通过对隐藏表示进行跨领域和类别的平衡增强,生成不变性预测器。基于所提出的选择性平衡采样策略,TALLY 通过将一个样本的语义表示与另一个样本中与领域相关的干扰项进行混合,生成新的表示用于数据增强。为提高语义表示的解耦能力,TALLY 进一步利用领域不变的类别原型,以平均领域特定效应。我们在经典领域泛化基准测试的四个长尾变体数据集以及两个真实不平衡多领域数据集上评估了 TALLY。结果表明,TALLY 在子群偏移和领域偏移两种场景下均持续优于其他最先进方法。