Prediction models can perform poorly when deployed to target distributions different from the training distribution. To understand these operational failure modes, we develop a method, called DIstribution Shift DEcomposition (DISDE), to attribute a drop in performance to different types of distribution shifts. Our approach decomposes the performance drop into terms for 1) an increase in harder but frequently seen examples from training, 2) changes in the relationship between features and outcomes, and 3) poor performance on examples infrequent or unseen during training. These terms are defined by fixing a distribution on $X$ while varying the conditional distribution of $Y \mid X$ between training and target, or by fixing the conditional distribution of $Y \mid X$ while varying the distribution on $X$. In order to do this, we define a hypothetical distribution on $X$ consisting of values common in both training and target, over which it is easy to compare $Y \mid X$ and thus predictive performance. We estimate performance on this hypothetical distribution via reweighting methods. Empirically, we show how our method can 1) inform potential modeling improvements across distribution shifts for employment prediction on tabular census data, and 2) help to explain why certain domain adaptation methods fail to improve model performance for satellite image classification.
翻译:预测模型在部署到与训练分布不同的目标分布时,可能表现不佳。为理解这些运行失效模式,我们开发了一种名为“分布偏移分解”(DISDE)的方法,将性能下降归因于不同类型的分布偏移。我们的方法将性能下降分解为三项:1)训练中常见但难度更大的示例增多;2)特征与结果之间关系的变化;3)在训练中不常见或未见过的示例上表现不佳。这些项通过固定 $X$ 的分布同时改变训练分布与目标分布中 $Y \mid X$ 的条件分布,或固定 $Y \mid X$ 的条件分布同时改变 $X$ 的分布来定义。为此,我们定义了一个在训练和目标分布中共同存在的常见值的假设 $X$ 分布,在此分布上易于比较 $Y \mid X$ 及预测性能。我们通过重加权方法估计该假设分布上的性能。实验表明,我们的方法能够:1)针对表格型人口普查数据上的就业预测,揭示不同分布偏移下潜在的模型改进方向;2)帮助解释某些领域适应方法为何未能提升卫星图像分类的模型性能。