Different distribution shifts require different interventions, and algorithms must be grounded in the specific shifts they address. However, methodological development for robust algorithms typically relies on structural assumptions that lack empirical validation. Advocating for an empirically grounded data-driven approach to research, we build an empirical testbed comprising natural shifts across 5 tabular datasets and 60,000 method configurations encompassing imbalanced learning and distributionally robust optimization (DRO) methods. We find $Y|X$-shifts are most prevalent on our testbed, in stark contrast to the heavy focus on $X$ (covariate)-shifts in the ML literature. The performance of robust algorithms varies significantly over shift types, and is no better than that of vanilla methods. To understand why, we conduct an in-depth empirical analysis of DRO methods and find that although often neglected by researchers, implementation details -- such as the choice of underlying model class (e.g., XGBoost) and hyperparameter selection -- have a bigger impact on performance than the ambiguity set or its radius. To further bridge that gap between methodological research and practice, we design case studies that illustrate how such a data-driven, inductive understanding of distribution shifts can enhance both data-centric and algorithmic interventions.
翻译:不同的分布偏移需要不同的干预措施,算法必须基于其处理的具体偏移类型进行设计。然而,稳健算法的方法论发展通常依赖于缺乏实证验证的结构性假设。我们倡导一种基于实证的数据驱动研究方法,构建了一个包含5个表格数据集上的自然偏移以及涵盖不平衡学习与分布鲁棒优化(DRO)方法的60,000种方法配置的实证测试平台。我们发现,在我们的测试平台上,$Y|X$偏移最为普遍,这与机器学习文献中重点关注$X$(协变量)偏移形成鲜明对比。稳健算法在不同偏移类型上的性能差异显著,且并不优于基础方法。为了探究其原因,我们对DRO方法进行了深入的实证分析,发现尽管常被研究者忽视,但实现细节——例如底层模型类别的选择(如XGBoost)和超参数选择——对性能的影响比模糊集或其半径更大。为了进一步弥合方法论研究与实践之间的差距,我们设计了案例研究,以说明这种数据驱动的、归纳式的分布偏移理解如何能够同时增强以数据为中心和以算法为中心的干预措施。