Predictive performance of machine learning models trained with empirical risk minimization (ERM) can degrade considerably under distribution shifts. The presence of spurious correlations in training datasets leads ERM-trained models to display high loss when evaluated on minority groups not presenting such correlations. Extensive attempts have been made to develop methods improving worst-group robustness. However, they require group information for each training input or at least, a validation set with group labels to tune their hyperparameters, which may be expensive to get or unknown a priori. In this paper, we address the challenge of improving group robustness without group annotation during training or validation. To this end, we propose to partition the training dataset into groups based on Gram matrices of features extracted by an ``identification'' model and to apply robust optimization based on these pseudo-groups. In the realistic context where no group labels are available, our experiments show that our approach not only improves group robustness over ERM but also outperforms all recent baselines
翻译:基于经验风险最小化(ERM)训练的机器学习模型,在分布偏移下其预测性能可能显著下降。训练数据集中存在的虚假相关性,会导致ERM训练的模型在评估缺乏此类相关性的少数群体时表现出高损失。为改进最差群体鲁棒性,已有大量方法被提出。然而,这些方法需要每个训练输入的群体信息,或至少需要带有群体标签的验证集来调整超参数,而此类信息可能难以获取或事先未知。本文旨在解决训练或验证过程中无群体标注情况下的群体鲁棒性提升难题。为此,我们提出基于"识别"模型提取特征的Gram矩阵来划分训练数据集,并基于这些伪群体应用鲁棒优化。在无群体标签的现实场景中,实验表明我们的方法不仅提升了相较于ERM的群体鲁棒性,且优于所有近期基线方法。