Supervised learning is often affected by a covariate shift in which the marginal distributions of instances (covariates $x$) of training and testing samples $\mathrm{p}_\text{tr}(x)$ and $\mathrm{p}_\text{te}(x)$ are different but the label conditionals coincide. Existing approaches address such covariate shift by either using the ratio $\mathrm{p}_\text{te}(x)/\mathrm{p}_\text{tr}(x)$ to weight training samples (reweighting methods) or using the ratio $\mathrm{p}_\text{tr}(x)/\mathrm{p}_\text{te}(x)$ to weight testing samples (robust methods). However, the performance of such approaches can be poor under support mismatch or when the above ratios take large values. We propose a minimax risk classification (MRC) approach for covariate shift adaptation that avoids such limitations by weighting both training and testing samples. In addition, we develop effective techniques that obtain both sets of weights and generalize the conventional kernel mean matching method. We provide novel generalization bounds for our method that show a significant increase in the effective sample size compared with reweighted methods. The proposed method also achieves enhanced classification performance in both synthetic and empirical experiments.
翻译:监督学习常受协变量偏移影响,此时训练样本与测试样本中实例(协变量 $x$)的边缘分布 $\mathrm{p}_\text{tr}(x)$ 和 $\mathrm{p}_\text{te}(x)$ 不同,但标签条件分布相同。现有方法通过采用比值 $\mathrm{p}_\text{te}(x)/\mathrm{p}_\text{tr}(x)$ 对训练样本加权(重加权方法),或采用比值 $\mathrm{p}_\text{tr}(x)/\mathrm{p}_\text{te}(x)$ 对测试样本加权(鲁棒方法)来处理此类偏移。然而,在支持不匹配或上述比值取值较大时,这些方法的性能可能较差。我们提出一种用于协变量偏移自适应的最小最大风险分类方法,通过对训练与测试样本同时加权来避免此类局限。此外,我们开发了有效技术以同时获取两组权重,并推广了传统核均值匹配方法。我们的方法提供了新的泛化界,表明相较于重加权方法,有效样本量显著增加。在合成与实证实验中,所提方法均实现了更优的分类性能。