In practical machine learning, the environments encountered during the model development and deployment phases often differ, especially when a model is used by many users in diverse settings. Learning models that maintain reliable performance across plausible deployment environments is known as distributionally robust (DR) learning. In this work, we study the problem of distributionally robust feature selection (DRFS), with a particular focus on sparse sensing applications motivated by industrial needs. In practical multi-sensor systems, a shared subset of sensors is typically selected prior to deployment based on performance evaluations using many available sensors. At deployment, individual users may further adapt or fine-tune models to their specific environments. When deployment environments differ from those anticipated during development, this strategy can result in systems lacking sensors required for optimal performance. To address this issue, we propose safe-DRFS, a novel approach that extends safe screening from conventional sparse modeling settings to a DR setting under covariate shift. Our method identifies a feature subset that encompasses all subsets that may become optimal across a specified range of input distribution shifts, with finite-sample theoretical guarantees of no false feature elimination.
翻译:在实际机器学习应用中,模型开发阶段与部署阶段所面临的环境往往存在差异,特别是当模型被众多用户应用于多样化场景时。学习能够在合理部署环境中保持可靠性能的模型,被称为分布鲁棒学习。本研究聚焦于分布鲁棒特征选择问题,尤其关注由工业需求驱动的稀疏传感应用场景。在实际多传感器系统中,通常会在部署前基于大量可用传感器的性能评估,预先选定一个共享的传感器子集。在部署阶段,个体用户可能根据其特定环境对模型进行进一步适配或微调。当部署环境与开发阶段预期环境存在差异时,该策略可能导致系统缺乏实现最优性能所需的传感器。为解决这一问题,我们提出安全分布鲁棒特征选择方法,将传统稀疏建模环境中的安全筛选技术扩展至协变量偏移下的分布鲁棒场景。该方法能够识别出一个特征子集,该子集涵盖在指定输入分布偏移范围内可能成为最优的所有子集,并具有有限样本理论保证,确保不会发生错误特征剔除。