Many existing approaches for estimating parameters in settings with distributional shifts operate under an invariance assumption. For example, under covariate shift, it is assumed that p(y|x) remains invariant. We refer to such distribution shifts as sparse, since they may be substantial but affect only a part of the data generating system. In contrast, in various real-world settings, shifts might be dense. More specifically, these dense distributional shifts may arise through numerous small and random changes in the population and environment. First, we will discuss empirical evidence for such random dense distributional shifts and explain why commonly used models for distribution shifts-including adversarial approaches-may not be appropriate under these conditions. Then, we will develop tools to infer parameters and make predictions for partially observed, shifted distributions. Finally, we will apply the framework to several real-world data sets and discuss diagnostics to evaluate the fit of the distributional uncertainty model.
翻译:许多现有的在分布偏移场景下估计参数的方法都基于不变性假设。例如,在协变量偏移下,假设p(y|x)保持不变。我们将此类分布偏移称为稀疏偏移,因为它们可能很大,但仅影响数据生成系统的一部分。相比之下,在各种现实场景中,偏移可能是密集的。更具体地说,这些密集分布偏移可能源于群体和环境中大量微小且随机的变化。首先,我们将讨论此类随机密集分布偏移的经验证据,并解释在这些条件下,常用的分布偏移模型(包括对抗性方法)可能并不适用。然后,我们将开发工具,用于在部分观测到的偏移分布下推断参数并进行预测。最后,我们将该框架应用于多个真实数据集,并讨论评估分布不确定性模型拟合效果的诊断方法。