In this study, we propose a method Distributionally Robust Safe Screening (DRSS), for identifying unnecessary samples and features within a DR covariate shift setting. This method effectively combines DR learning, a paradigm aimed at enhancing model robustness against variations in data distribution, with safe screening (SS), a sparse optimization technique designed to identify irrelevant samples and features prior to model training. The core concept of the DRSS method involves reformulating the DR covariate-shift problem as a weighted empirical risk minimization problem, where the weights are subject to uncertainty within a predetermined range. By extending the SS technique to accommodate this weight uncertainty, the DRSS method is capable of reliably identifying unnecessary samples and features under any future distribution within a specified range. We provide a theoretical guarantee of the DRSS method and validate its performance through numerical experiments on both synthetic and real-world datasets.
翻译:本研究提出了一种分布鲁棒安全筛选(DRSS)方法,用于在分布鲁棒协变量偏移场景下识别不必要的样本和特征。该方法有效结合了分布鲁棒学习(一种旨在增强模型对数据分布变化鲁棒性的范式)与安全筛选(一种在模型训练前识别无关样本和特征的稀疏优化技术)。DRSS方法的核心思想是将分布鲁棒协变量偏移问题重新表述为加权经验风险最小化问题,其中权重在预定范围内存在不确定性。通过将安全筛选技术扩展以适应这种权重不确定性,DRSS方法能够在指定范围内任意未来分布下可靠地识别不必要的样本和特征。我们为DRSS方法提供了理论保证,并通过在合成数据集和真实世界数据集上的数值实验验证了其性能。