Distribution shift in machine learning models can be a primary cause of performance degradation. This paper delves into the characteristics of these shifts, primarily motivated by Real-Time Bidding (RTB) market models. We emphasize the challenges posed by class imbalance and sample selection bias, both potent instigators of distribution shifts. This paper introduces the Exponential Tilt Reweighting Alignment (ExTRA) algorithm, as proposed by Marty et al. (2023), to address distribution shifts in data. The ExTRA method is designed to determine the importance weights on the source data, aiming to minimize the KL divergence between the weighted source and target datasets. A notable advantage of this method is its ability to operate using labeled source data and unlabeled target data. Through simulated real-world data, we investigate the nature of distribution shift and evaluate the applicacy of the proposed model.
翻译:机器学习模型中的分布偏移可能是性能下降的主要原因。本文受实时竞价市场模型的启发,深入探讨了这类偏移的特性。我们强调了类别不平衡和样本选择偏差所带来的挑战,这两者都是分布偏移的主要诱因。本文引入由Marty等人(2023)提出的指数倾斜重加权对齐算法,以应对数据中的分布偏移。ExTRA方法旨在确定源数据上的重要性权重,目标是使加权后的源数据集与目标数据集之间的KL散度最小化。该方法的一个显著优势是能够利用带标签的源数据和无标签的目标数据。通过模拟真实世界的数据,我们研究了分布偏移的本质,并评估了所提模型的适用性。