Evaluating the performance of machine learning models under distribution shift is challenging, especially when we only have unlabeled data from the shifted (target) domain, along with labeled data from the original (source) domain. Recent work suggests that the notion of disagreement, the degree to which two models trained with different randomness differ on the same input, is a key to tackle this problem. Experimentally, disagreement and prediction error have been shown to be strongly connected, which has been used to estimate model performance. Experiments have led to the discovery of the disagreement-on-the-line phenomenon, whereby the classification error under the target domain is often a linear function of the classification error under the source domain; and whenever this property holds, disagreement under the source and target domain follow the same linear relation. In this work, we develop a theoretical foundation for analyzing disagreement in high-dimensional random features regression; and study under what conditions the disagreement-on-the-line phenomenon occurs in our setting. Experiments on CIFAR-10-C, Tiny ImageNet-C, and Camelyon17 are consistent with our theory and support the universality of the theoretical findings.
翻译:在分布偏移下评估机器学习模型性能具有挑战性,尤其当我们仅有来自偏移(目标)域的无标签数据和原始(源)域的有标签数据时。最新研究表明,分歧(即不同随机性训练的两个模型对同一输入的输出差异程度)是解决该问题的关键。实验表明,分歧与预测错误之间存在紧密关联,这种关联已被用于估计模型性能。实验发现了"线上分歧"现象:目标域分类错误往往是源域分类错误的线性函数;而当该性质成立时,源域与目标域的分歧也遵循相同的线性关系。本研究为高维随机特征回归中的分歧分析建立了理论基础,并探究了在我们的设定中"线上分歧"现象产生的条件。CIFAR-10-C、Tiny ImageNet-C和Camelyon17数据集上的实验与理论一致,验证了理论发现的普适性。