DiscoVars: A New Data Analysis Perspective -- Application in Variable Selection for Clustering - 专知论文

会员服务 ·

0

变量选择 · 中心性 · 数据分析 · 度量 · 分析 ·

2023 年 4 月 8 日

DiscoVars: A New Data Analysis Perspective -- Application in Variable Selection for Clustering

翻译：DiscoVars：一种新的数据分析视角——在聚类变量选择中的应用

from arxiv, 13 Pages, Technical Report, Verikar Software

We present a new data analysis perspective to determine variable importance regardless of the underlying learning task. Traditionally, variable selection is considered an important step in supervised learning for both classification and regression problems. The variable selection also becomes critical when costs associated with the data collection and storage are considerably high for cases like remote sensing. Therefore, we propose a new methodology to select important variables from the data by first creating dependency networks among all variables and then ranking them (i.e. nodes) by graph centrality measures. Selecting Top-$n$ variables according to preferred centrality measure will yield a strong candidate subset of variables for further learning tasks e.g. clustering. We present our tool as a Shiny app which is a user-friendly interface development environment. We also extend the user interface for two well-known unsupervised variable selection methods from literature for comparison reasons.

翻译：我们提出了一种新的数据分析视角，用于确定变量重要性，而无需考虑底层学习任务。传统上，变量选择被视为分类和回归问题中监督学习的重要步骤。当数据收集和存储成本较高（如遥感案例）时，变量选择也显得至关重要。因此，我们提出了一种新方法，首先通过构建所有变量之间的依赖网络，然后通过图中心性指标对节点进行排序，从而从数据中选择重要变量。根据首选中心性指标选择Top-$n$变量，将为后续学习任务（例如聚类）生成一个强候选变量子集。我们将此工具以Shiny应用程序的形式呈现，这是一种用户友好的界面开发环境。我们还扩展了用户界面，以集成文献中两种著名的无监督变量选择方法，用于比较分析。

0

相关内容

变量选择

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

66+阅读 · 2023年2月15日

【2022新书】Python数据分析第三版，与Pandas、NumPy和Jupyter进行数据争论

【2022新书】Python数据分析第三版，与Pandas、NumPy和Jupyter进行数据争论

专知会员服务

125+阅读 · 2022年10月16日

机器学习损失函数概述，Loss Functions in Machine Learning

机器学习损失函数概述，Loss Functions in Machine Learning

专知会员服务

85+阅读 · 2022年3月19日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

斯坦福CS246《大数据挖掘》2021课程开始了！Jure Leskovec大牛主讲，附课程PPT下载

斯坦福CS246《大数据挖掘》2021课程开始了！Jure Leskovec大牛主讲，附课程PPT下载

专知会员服务

61+阅读 · 2021年5月10日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

【经典书】使用机器学习R语言，149页pdf，Practical Machine Learning in R

【经典书】使用机器学习R语言，149页pdf，Practical Machine Learning in R

专知会员服务

24+阅读 · 2021年1月13日

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

专知会员服务

32+阅读 · 2020年2月1日

【电子书】机器学习实战（Machine Learning in Action），附PDF

【电子书】机器学习实战（Machine Learning in Action），附PDF

专知会员服务

132+阅读 · 2019年11月25日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

247+阅读 · 2019年10月21日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

专知

18+阅读 · 2022年4月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

HIF-1/COMPASS调控缺氧诱导Brg1和Brm表达上调的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

高维半参数模型假设检验问题的研究

国家自然科学基金

1+阅读 · 2015年12月31日

测量误差数据下约束线性模型的有偏估计及变量选择研究

国家自然科学基金

0+阅读 · 2014年12月31日

复杂数据下含指标项半参数模型结构的统计推断及应用

国家自然科学基金

0+阅读 · 2014年12月31日

大白菜KIN基因的表达及其pre-mRNA加工机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

带约束推断的参数和半参数回归模型有偏估计及变量选择理论与方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

能源与气候变化政策建模中要素替代弹性研究

国家自然科学基金

0+阅读 · 2012年12月31日

病毒基因空间几何算法研究和对新病毒威胁快速探测及预警

国家自然科学基金

0+阅读 · 2012年12月31日

高维数据的假设检验

国家自然科学基金

0+阅读 · 2012年12月31日

新能源汽车产品战略与商业模式

国家自然科学基金

6+阅读 · 2010年11月30日

Addressing bias in online selection with limited budget of comparisons

Arxiv

0+阅读 · 2023年5月26日

Flexible Variable Selection for Clustering and Classification

Arxiv

0+阅读 · 2023年5月25日

Reliable identification of selection mechanisms in language change

Arxiv

0+阅读 · 2023年5月25日

Inference in balanced community modulated recursive trees

Arxiv

0+阅读 · 2023年5月24日

Preliminary investigations on induction over real numbers

Arxiv

0+阅读 · 2023年5月24日

A survey and taxonomy of loss functions in machine learning

Arxiv

28+阅读 · 2023年1月13日

A continual learning survey: Defying forgetting in classification tasks

Arxiv

32+阅读 · 2021年4月16日

A Survey on Multi-Task Learning

Arxiv

32+阅读 · 2021年3月29日

A Collective Learning Framework to Boost GNN Expressiveness

A Collective Learning Framework to Boost GNN Expressiveness

Arxiv

20+阅读 · 2020年3月26日

Knowledge-aware Graph Neural Networks with Label Smoothness Regularization for Recommendation

Arxiv

11+阅读 · 2019年6月13日

VIP会员

文章信息

相关主题

最新内容

印度精确打击与指挥架构的断层

印度精确打击与指挥架构的断层

专知会员服务

4+阅读 · 7月20日

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

专知会员服务

6+阅读 · 7月20日

美空军AI完成F-16战斗机自主空战历史性试飞

美空军AI完成F-16战斗机自主空战历史性试飞

专知会员服务

6+阅读 · 7月20日

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

专知会员服务

6+阅读 · 7月20日

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

专知会员服务

4+阅读 · 7月20日

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

专知会员服务

7+阅读 · 7月20日

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

专知会员服务

6+阅读 · 7月20日

深入Project Maven：为何人工智能在战场上依然失灵

深入Project Maven：为何人工智能在战场上依然失灵

专知会员服务

14+阅读 · 7月19日

锻造未来士兵：外骨骼、基因工程与赛博格

锻造未来士兵：外骨骼、基因工程与赛博格

专知会员服务

7+阅读 · 7月19日

《无人机系统（UAS）通信网状网络试验性部署》50页报告

《无人机系统（UAS）通信网状网络试验性部署》50页报告

专知会员服务

9+阅读 · 7月19日

《无人机蜂群通信技术研究》50页

《无人机蜂群通信技术研究》50页

专知会员服务

10+阅读 · 7月19日

《基于智能体建模与仿真的无人机蜂群模型目标定位涌现行为比较分析》360页

《基于智能体建模与仿真的无人机蜂群模型目标定位涌现行为比较分析》360页

专知会员服务

15+阅读 · 7月18日

欧洲智能弹药战略创新管理：迈向制导弹药、巡飞系统与自主无人机蜂群的技术主权研究路线图

欧洲智能弹药战略创新管理：迈向制导弹药、巡飞系统与自主无人机蜂群的技术主权研究路线图

专知会员服务

8+阅读 · 7月18日

从领域适配到部署与可解释：Berkeley博士论文解析大语言模型真实落地

从领域适配到部署与可解释：Berkeley博士论文解析大语言模型真实落地

专知会员服务

16+阅读 · 7月18日

综述 | 长程智能体研究全景：基础、演化、框架、优化与前沿

综述 | 长程智能体研究全景：基础、演化、框架、优化与前沿

专知会员服务

11+阅读 · 7月18日

相关VIP内容

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

66+阅读 · 2023年2月15日

【2022新书】Python数据分析第三版，与Pandas、NumPy和Jupyter进行数据争论

【2022新书】Python数据分析第三版，与Pandas、NumPy和Jupyter进行数据争论

专知会员服务

125+阅读 · 2022年10月16日

机器学习损失函数概述，Loss Functions in Machine Learning

机器学习损失函数概述，Loss Functions in Machine Learning

专知会员服务

85+阅读 · 2022年3月19日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

斯坦福CS246《大数据挖掘》2021课程开始了！Jure Leskovec大牛主讲，附课程PPT下载

斯坦福CS246《大数据挖掘》2021课程开始了！Jure Leskovec大牛主讲，附课程PPT下载

专知会员服务

61+阅读 · 2021年5月10日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

【经典书】使用机器学习R语言，149页pdf，Practical Machine Learning in R

【经典书】使用机器学习R语言，149页pdf，Practical Machine Learning in R

专知会员服务

24+阅读 · 2021年1月13日

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

【论文推荐WWW2020-UIUC】修正排序系统中的选择偏差：Correcting for Selection Bias in Learning-to-rank Systems

专知会员服务

32+阅读 · 2020年2月1日

【电子书】机器学习实战（Machine Learning in Action），附PDF

【电子书】机器学习实战（Machine Learning in Action），附PDF

专知会员服务

132+阅读 · 2019年11月25日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

247+阅读 · 2019年10月21日

热门VIP内容

开通专知VIP会员享更多权益服务

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

印度精确打击与指挥架构的断层

美空军AI完成F-16战斗机自主空战历史性试飞

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

专知

18+阅读 · 2022年4月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Addressing bias in online selection with limited budget of comparisons

Arxiv

0+阅读 · 2023年5月26日

Flexible Variable Selection for Clustering and Classification

Arxiv

0+阅读 · 2023年5月25日

Reliable identification of selection mechanisms in language change

Arxiv

0+阅读 · 2023年5月25日

Inference in balanced community modulated recursive trees

Arxiv

0+阅读 · 2023年5月24日

Preliminary investigations on induction over real numbers

Arxiv

0+阅读 · 2023年5月24日

A survey and taxonomy of loss functions in machine learning

Arxiv

28+阅读 · 2023年1月13日

A continual learning survey: Defying forgetting in classification tasks

Arxiv

32+阅读 · 2021年4月16日

A Survey on Multi-Task Learning

Arxiv

32+阅读 · 2021年3月29日

A Collective Learning Framework to Boost GNN Expressiveness

A Collective Learning Framework to Boost GNN Expressiveness

Arxiv

20+阅读 · 2020年3月26日

Knowledge-aware Graph Neural Networks with Label Smoothness Regularization for Recommendation

Arxiv

11+阅读 · 2019年6月13日

相关基金

HIF-1/COMPASS调控缺氧诱导Brg1和Brm表达上调的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

高维半参数模型假设检验问题的研究

国家自然科学基金

1+阅读 · 2015年12月31日

测量误差数据下约束线性模型的有偏估计及变量选择研究

国家自然科学基金

0+阅读 · 2014年12月31日

复杂数据下含指标项半参数模型结构的统计推断及应用

国家自然科学基金

0+阅读 · 2014年12月31日

大白菜KIN基因的表达及其pre-mRNA加工机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

带约束推断的参数和半参数回归模型有偏估计及变量选择理论与方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

能源与气候变化政策建模中要素替代弹性研究

国家自然科学基金

0+阅读 · 2012年12月31日

病毒基因空间几何算法研究和对新病毒威胁快速探测及预警

国家自然科学基金

0+阅读 · 2012年12月31日

高维数据的假设检验

国家自然科学基金

0+阅读 · 2012年12月31日

新能源汽车产品战略与商业模式

国家自然科学基金

6+阅读 · 2010年11月30日

微信扫码咨询专知VIP会员