The presence of distribution shifts poses a significant challenge for deploying modern machine learning models in real-world applications. This work focuses on the target shift problem in a regression setting (Zhang et al., 2013; Nguyen et al., 2016). More specifically, the target variable y (also known as the response variable), which is continuous, has different marginal distributions in the training source and testing domain, while the conditional distribution of features x given y remains the same. While most literature focuses on classification tasks with finite target space, the regression problem has an infinite dimensional target space, which makes many of the existing methods inapplicable. In this work, we show that the continuous target shift problem can be addressed by estimating the importance weight function from an ill-posed integral equation. We propose a nonparametric regularized approach named ReTaSA to solve the ill-posed integral equation and provide theoretical justification for the estimated importance weight function. The effectiveness of the proposed method has been demonstrated with extensive numerical studies on synthetic and real-world datasets.
翻译:分布偏移的存在对现代机器学习模型在实际应用中的部署构成了重大挑战。本研究聚焦于回归场景中的目标偏移问题(Zhang等,2013;Nguyen等,2016)。具体而言,连续型目标变量y(也称响应变量)在训练源域与测试域中的边缘分布存在差异,而特征x在给定y下的条件分布保持不变。尽管多数文献关注目标空间有限的分类任务,但回归问题涉及无限维目标空间,导致现有方法大多难以适用。本研究证明,连续目标偏移问题可通过从不适定积分方程中估计重要性权重函数来解决。我们提出名为ReTaSA的非参数正则化方法以求解该不适定积分方程,并为估计出的重要性权重函数提供理论依据。通过合成数据集与真实数据集的大量数值实验,验证了所提方法的有效性。