Recent work has shown that standard training via empirical risk minimization (ERM) can produce models that achieve high accuracy on average but low accuracy on underrepresented groups due to the prevalence of spurious features. A predominant approach to tackle this group robustness problem minimizes the worst group error (akin to a minimax strategy) on the training data, hoping it will generalize well on the testing data. However, this is often suboptimal, especially when the out-of-distribution (OOD) test data contains previously unseen groups. Inspired by ideas from the information retrieval and learning-to-rank literature, this paper first proposes to use Discounted Cumulative Gain (DCG) as a metric of model quality for facilitating better hyperparameter tuning and model selection. Being a ranking-based metric, DCG weights multiple poorly-performing groups (instead of considering just the group with the worst performance). As a natural next step, we build on our results to propose a ranking-based training method called Discounted Rank Upweighting (DRU), which differentially reweights a ranked list of poorly-performing groups in the training data to learn models that exhibit strong OOD performance on the test data. Results on several synthetic and real-world datasets highlight the superior generalization ability of our group-ranking-based (akin to soft-minimax) approach in selecting and learning models that are robust to group distributional shifts.
翻译:近期研究表明,通过经验风险最小化(ERM)进行标准训练,由于虚假特征的普遍存在,可能产生平均准确率高但少数群体准确率低的模型。解决这一群体鲁棒性问题的主流方法是最小化训练数据上的最差群体误差(类似于极小化极大策略),期望其能良好泛化至测试数据。然而,当分布外(OOD)测试数据包含先前未见群体时,这种方法往往表现欠佳。受信息检索与排序学习领域思想启发,本文首先提出使用折损累计增益(DCG)作为模型质量度量标准,以促进超参数调优与模型选择。作为基于排序的度量标准,DCG对多个表现不佳的群体进行加权(而非仅考虑性能最差的群体)。作为自然延伸,我们基于此结果提出一种基于排序的训练方法——折损排名加权(DRU),该方法对训练数据中按序排列的多个表现不佳群体进行差异化重加权,从而学习出对测试数据具有强OOD性能的模型。在多个合成与真实数据集上的结果突显了基于群体排序(类似于软极小化极大)方法在选择和学习对群体分布偏移具有鲁棒性的模型时表现出的优异泛化能力。