A feature selection method based on Shapley values robust to concept shift in regression - 专知论文

会员服务 ·

0

概念偏移 · 特征选择 · 稳健性 · Performer · MoDELS ·

2023 年 9 月 25 日

A feature selection method based on Shapley values robust to concept shift in regression

翻译：一种基于Shapley值的回归概念漂移鲁棒特征选择方法

Carlos Sebastián,Carlos E. González-Guillén

Feature selection is one of the most relevant processes in any methodology for creating a statistical learning model. Usually, existing algorithms establish some criterion to select the most influential variables, discarding those that do not contribute to the model with any relevant information. This methodology makes sense in a static situation where the joint distribution of the data does not vary over time. However, when dealing with real data, it is common to encounter the problem of the dataset shift and, specifically, changes in the relationships between variables (concept shift). In this case, the influence of a variable cannot be the only indicator of its quality as a regressor of the model, since the relationship learned in the training phase may not correspond to the current situation. In tackling this problem, our approach establishes a direct relationship between the Shapley values and prediction errors, operating at a more local level to effectively detect the individual biases introduced by each variable. The proposed methodology is evaluated through various examples, including synthetic scenarios mimicking sudden and incremental shift situations, as well as two real-world cases characterized by concept shifts. Additionally, we perform three analyses of standard situations to assess the algorithm's robustness in the absence of shifts. The results demonstrate that our proposed algorithm significantly outperforms state-of-the-art feature selection methods in concept shift scenarios, while matching the performance of existing methodologies in static situations.

翻译：特征选择是构建统计学习模型的所有方法论中最为关键的环节之一。现有算法通常建立某种准则来选择最具影响力的变量，剔除那些未能为模型贡献相关信息的变量。这种方法在数据联合分布不随时间变化的静态场景下具有合理性。然而，处理真实数据时常面临数据集漂移问题，尤其是变量间关系的变化（概念漂移）。在此情况下，变量的影响力不足以作为其作为模型回归变量质量的唯一指标，因为训练阶段习得的关系可能已不适用于当前情境。为解决该问题，本文方法在Shapley值与预测误差之间建立直接关联，通过更局部的分析有效检测每个变量引入的个体偏差。我们通过多种示例对所提方法论进行评估，包括模拟突发性和渐进性漂移场景的合成数据，以及两个具有概念漂移特征的现实案例。此外，我们还在无漂移条件下进行三项标准情景分析以评估算法的鲁棒性。结果表明，所提算法在概念漂移场景中显著优于现有最优特征选择方法，而在静态场景下其性能与现有方法相当。

0

相关内容

概念偏移

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

RL解决'BipedalWalkerHardcore-v2' (SOTA)

RL解决'BipedalWalkerHardcore-v2' (SOTA)

CreateAMind

31+阅读 · 2019年7月17日

Github项目推荐 | 股市预测的机器学习/深度学习模型/资源集锦

Github项目推荐 | 股市预测的机器学习/深度学习模型/资源集锦

AI研习社

33+阅读 · 2019年4月18日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

论文浅尝 | 嵌入常识知识的注意力 LSTM 模型用于特定目标的基于侧面的情感分析

论文浅尝 | 嵌入常识知识的注意力 LSTM 模型用于特定目标的基于侧面的情感分析

开放知识图谱

28+阅读 · 2018年6月11日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

论文浅尝 | 利用 RNN 和 CNN 构建基于 FreeBase 的问答系统

论文浅尝 | 利用 RNN 和 CNN 构建基于 FreeBase 的问答系统

开放知识图谱

11+阅读 · 2018年4月25日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

基于位置注意力机制模型和带标签数据来提升槽填充（EMNLP outstanding paper）

基于位置注意力机制模型和带标签数据来提升槽填充（EMNLP outstanding paper）

科技创新与创业

17+阅读 · 2017年11月17日

Layer Normalization原理及其TensorFlow实现

Layer Normalization原理及其TensorFlow实现

深度学习每日摘要

32+阅读 · 2017年6月17日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

有理 Krylov 子空间算法的最优参数选取

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

一类带有Hénon项的变分问题研究

国家自然科学基金

0+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

Biot模型基于有限元离散的多重网格算法研究

国家自然科学基金

1+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

An extension of the Unified Skew-Normal family of distributions and application to Bayesian binary regression

An extension of the Unified Skew-Normal family of distributions and application to Bayesian binary regression

Arxiv

0+阅读 · 2023年11月9日

Port-Hamiltonian formulation and structure-preserving discretization of hyperelastic strings

Arxiv

0+阅读 · 2023年11月9日

Newton-GSOR method for solving large-scale unconstrained optimization problems

Arxiv

0+阅读 · 2023年11月9日

Uncertainty Wrapper in the medical domain: Establishing transparent uncertainty quantification for opaque machine learning models in practice

Arxiv

0+阅读 · 2023年11月9日

Physics informed machine learning with Smoothed Particle Hydrodynamics: Hierarchy of reduced Lagrangian models of turbulence

Arxiv

0+阅读 · 2023年11月8日

Multivariate generalized Pareto distributions along extreme directions

Arxiv

0+阅读 · 2023年11月8日

Explainable AI for Earth Observation: Current Methods, Open Challenges, and Opportunities

Arxiv

1+阅读 · 2023年11月8日

A graph convolutional autoencoder approach to model order reduction for parametrized PDEs

Arxiv

0+阅读 · 2023年11月7日

On efficient algorithms for computing near-best polynomial approximations to high-dimensional, Hilbert-valued functions from limited samples

Arxiv

0+阅读 · 2023年11月6日

Tensor Decompositions for temporal knowledge base completion

Arxiv

10+阅读 · 2020年4月10日

VIP会员

文章信息

相关主题

最新内容

论文解读 | 医学图像修复中的扩散模型：挑战、分类与未来方向

论文解读 | 医学图像修复中的扩散模型：挑战、分类与未来方向

专知会员服务

1+阅读 · 7月28日

博士论文 | 从算法到基础模型：强化学习的统一视角

博士论文 | 从算法到基础模型：强化学习的统一视角

专知会员服务

5+阅读 · 7月28日

面向国防作战的最佳自主与蜂群无人机技术

面向国防作战的最佳自主与蜂群无人机技术

专知会员服务

7+阅读 · 7月28日

《异构人类团队的协作决策过程混合建模研究》

《异构人类团队的协作决策过程混合建模研究》

专知会员服务

7+阅读 · 7月28日

《C5ISR系统中的注意力动态与自适应决策支持研究：视觉与多模态注意力引导对任务绩效影响的递归量化分析》最新36页报告

《C5ISR系统中的注意力动态与自适应决策支持研究：视觉与多模态注意力引导对任务绩效影响的递归量化分析》最新36页报告

专知会员服务

8+阅读 · 7月28日

《设计思维中的人机协作：生成式人工智能对共情访谈影响的探究》140页

《设计思维中的人机协作：生成式人工智能对共情访谈影响的探究》140页

专知会员服务

9+阅读 · 7月28日

博士论文 | 面向大模型推理的内存高效算法

博士论文 | 面向大模型推理的内存高效算法

专知会员服务

5+阅读 · 7月27日

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

论文解读 | 从预训练到后训练：理解大模型推理能力如何形成

专知会员服务

8+阅读 · 7月27日

《无人系统互操作性导论——无人系统联合架构（JAUS）》

《无人系统互操作性导论——无人系统联合架构（JAUS）》

专知会员服务

14+阅读 · 7月27日

美空军新型反无人机部队初探

美空军新型反无人机部队初探

专知会员服务

9+阅读 · 7月27日

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

《对抗性电磁环境下远程巡飞弹作战的安全指挥与控制数据链》

专知会员服务

8+阅读 · 7月27日

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

《北约下一代建模与仿真（NexGen M&S）计划》2026年69页

专知会员服务

7+阅读 · 7月27日

《防空交战流程的概率建模研究》

《防空交战流程的概率建模研究》

专知会员服务

12+阅读 · 7月27日

ICML 2026 教程 | 数值优化理论还重要吗？

ICML 2026 教程 | 数值优化理论还重要吗？

专知会员服务

7+阅读 · 7月26日

ICM 2026 | 陶哲轩：人工智能时代的数学

ICM 2026 | 陶哲轩：人工智能时代的数学

专知会员服务

10+阅读 · 7月26日

相关VIP内容

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

博士论文 | 从算法到基础模型：强化学习的统一视角

《异构人类团队的协作决策过程混合建模研究》

论文解读 | 医学图像修复中的扩散模型：挑战、分类与未来方向

面向国防作战的最佳自主与蜂群无人机技术

相关资讯

RL解决'BipedalWalkerHardcore-v2' (SOTA)

RL解决'BipedalWalkerHardcore-v2' (SOTA)

CreateAMind

31+阅读 · 2019年7月17日

Github项目推荐 | 股市预测的机器学习/深度学习模型/资源集锦

Github项目推荐 | 股市预测的机器学习/深度学习模型/资源集锦

AI研习社

33+阅读 · 2019年4月18日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

论文浅尝 | 嵌入常识知识的注意力 LSTM 模型用于特定目标的基于侧面的情感分析

论文浅尝 | 嵌入常识知识的注意力 LSTM 模型用于特定目标的基于侧面的情感分析

开放知识图谱

28+阅读 · 2018年6月11日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

论文浅尝 | 利用 RNN 和 CNN 构建基于 FreeBase 的问答系统

论文浅尝 | 利用 RNN 和 CNN 构建基于 FreeBase 的问答系统

开放知识图谱

11+阅读 · 2018年4月25日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

基于位置注意力机制模型和带标签数据来提升槽填充（EMNLP outstanding paper）

基于位置注意力机制模型和带标签数据来提升槽填充（EMNLP outstanding paper）

科技创新与创业

17+阅读 · 2017年11月17日

Layer Normalization原理及其TensorFlow实现

Layer Normalization原理及其TensorFlow实现

深度学习每日摘要

32+阅读 · 2017年6月17日

相关论文

An extension of the Unified Skew-Normal family of distributions and application to Bayesian binary regression

An extension of the Unified Skew-Normal family of distributions and application to Bayesian binary regression

Arxiv

0+阅读 · 2023年11月9日

Port-Hamiltonian formulation and structure-preserving discretization of hyperelastic strings

Arxiv

0+阅读 · 2023年11月9日

Newton-GSOR method for solving large-scale unconstrained optimization problems

Arxiv

0+阅读 · 2023年11月9日

Uncertainty Wrapper in the medical domain: Establishing transparent uncertainty quantification for opaque machine learning models in practice

Arxiv

0+阅读 · 2023年11月9日

Physics informed machine learning with Smoothed Particle Hydrodynamics: Hierarchy of reduced Lagrangian models of turbulence

Arxiv

0+阅读 · 2023年11月8日

Multivariate generalized Pareto distributions along extreme directions

Arxiv

0+阅读 · 2023年11月8日

Explainable AI for Earth Observation: Current Methods, Open Challenges, and Opportunities

Arxiv

1+阅读 · 2023年11月8日

A graph convolutional autoencoder approach to model order reduction for parametrized PDEs

Arxiv

0+阅读 · 2023年11月7日

On efficient algorithms for computing near-best polynomial approximations to high-dimensional, Hilbert-valued functions from limited samples

Arxiv

0+阅读 · 2023年11月6日

Tensor Decompositions for temporal knowledge base completion

Arxiv

10+阅读 · 2020年4月10日

相关基金

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

有理 Krylov 子空间算法的最优参数选取

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

一类带有Hénon项的变分问题研究

国家自然科学基金

0+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

Biot模型基于有限元离散的多重网格算法研究

国家自然科学基金

1+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员