Model-based Clustering with Missing Not At Random Data - 专知论文

会员服务 ·

0

簇 · Performer · Learning · 可辨认的 · 类别 ·

2023 年 2 月 15 日

Model-based Clustering with Missing Not At Random Data

翻译：基于模型的非随机缺失数据聚类方法

Aude Sportisse,Matthieu Marbac,Christophe Biernacki,Claire Boyer,Gilles Celeux,Julie Josse,Fabien Laporte

Model-based unsupervised learning, as any learning task, stalls as soon asmissing data occurs. This is even more true when the missing data are infor-mative, or said missing not at random (MNAR). In this paper, we proposemodel-based clustering algorithms designed to handle very general typesof missing data, including MNAR data. To do so, we introduce a mixturemodel for different types of data (continuous, count, categorical and mixed)to jointly model the data distribution and the MNAR mechanism, remainingvigilant to the degrees of freedom of each. Eight different MNAR modelswhich depend on the class membership and/or on the values of the missingvariables themselves are proposed. For a particular type of MNAR mod-els, for which the missingness depends on the class membership, we showthat the statistical inference can be carried out on the data matrix concate-nated with the missing mask considering a MAR mechanism instead; thisspecifically underlines the versatility of the studied MNAR models. Then,we establish sufficient conditions for identifiability of parameters of both thedata distribution and the mechanism. Regardless of the type of data and themechanism, we propose to perform clustering using EM or stochastic EMalgorithms specially developed for the purpose. Finally, we assess the nu-merical performances of the proposed methods on synthetic data and on thereal medical registry TraumaBase as well.

翻译：基于模型的无监督学习，如同任何学习任务一样，一旦出现缺失数据就会陷入停滞。当缺失数据具有信息性（即非随机缺失，MNAR）时，这一情况尤为突出。本文提出了专门处理包括MNAR数据在内的多种通用缺失数据类型（连续型、计数型、类别型及混合型）的基于模型聚类算法。为此，我们引入了一种适用于不同数据类型的混合模型，以联合建模数据分布与MNAR机制，同时警惕各模型自由度。提出了八种依赖于类成员关系和/或缺失变量本身取值的MNAR模型。针对一类特殊的MNAR模型（其缺失机制依赖于类成员关系），我们证明在数据矩阵与缺失掩码拼接后的矩阵上，可考虑将数据视为随机缺失（MAR）机制进行统计推断——这特别凸显了所研究MNAR模型的灵活性。进而，我们建立了数据分布与缺失机制参数可识别性的充分条件。无论数据类型与缺失机制如何，我们均采用为此专门开发的期望最大化（EM）或随机期望最大化（SEM）算法进行聚类。最后，通过合成数据与真实医疗注册数据库TraumaBase评估了所提方法的数值性能。

0

相关内容

【2023新书】使用Python进行统计和数据可视化，554页pdf

【2023新书】使用Python进行统计和数据可视化，554页pdf

专知会员服务

130+阅读 · 2023年1月29日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

128+阅读 · 2020年11月20日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【电子书推荐】Data Science with Python and Dask

【电子书推荐】Data Science with Python and Dask

专知会员服务

44+阅读 · 2019年6月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

ARB抑制miR-193a表达促进早期糖尿病肾病壁层上皮细胞-足细胞转分化研究

国家自然科学基金

0+阅读 · 2015年12月31日

重离子储存环CSRe上激光冷却相对论能量类锂12C3+离子束的实验研究

国家自然科学基金

0+阅读 · 2015年12月31日

Heisenberg 群上的 k-平面变换

国家自然科学基金

0+阅读 · 2015年12月31日

关于 Finsler 流形上调和映射与 Laplacian 的若干问题研究

国家自然科学基金

1+阅读 · 2014年12月31日

TMS1基因响应高温胁迫和ER Stress的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

拟南芥钙依赖型蛋白激酶CPK32响应低氮胁迫的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

幂零李群上热核估计的几个问题

国家自然科学基金

0+阅读 · 2012年12月31日

As(III)在二氧化钛表面的光促吸附机制

国家自然科学基金

0+阅读 · 2012年12月31日

原子转移自由基聚合模板合成尺寸精确可控的Ge,GeO2纳米材料及应用

国家自然科学基金

0+阅读 · 2011年12月31日

含氮石墨烯作为电催化材料的构筑与性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

A Class of Models for Large Zero-inflated Spatial Data

Arxiv

0+阅读 · 2023年4月5日

Partitioning Hypergraphs is Hard: Models, Inapproximability, and Applications

Arxiv

0+阅读 · 2023年4月5日

Many Data: Combine Experimental and Observational Data through a Power Likelihood

Arxiv

0+阅读 · 2023年4月5日

Learning from data with structured missingness

Arxiv

0+阅读 · 2023年4月4日

Data-graph repairs: the preferred approach

Arxiv

0+阅读 · 2023年4月3日

A Revenue Function for Comparison-Based Hierarchical Clustering

Arxiv

0+阅读 · 2023年4月2日

Factorization of Multi-Agent Sampling-Based Motion Planning

Arxiv

0+阅读 · 2023年4月1日

Efficiently transporting average treatment effects using a sufficient subset of effect modifiers

Arxiv

0+阅读 · 2023年3月31日

Updating Embeddings for Dynamic Knowledge Graphs

Arxiv

20+阅读 · 2021年9月22日

Contrastive Clustering

Arxiv

31+阅读 · 2020年9月21日

VIP会员

文章信息

相关主题

最新内容

现代战争的隐蔽系统：伊朗战争十大启示

现代战争的隐蔽系统：伊朗战争十大启示

专知会员服务

0+阅读 · 今天3:58

ICML 2026 | 自回归Boltzmann生成器重塑分子采样

ICML 2026 | 自回归Boltzmann生成器重塑分子采样

专知会员服务

3+阅读 · 6月26日

GNN跨域综述：从消息传递到图基础模型

GNN跨域综述：从消息传递到图基础模型

专知会员服务

4+阅读 · 6月26日

无人机自主控制与人工智能：系统性综述

无人机自主控制与人工智能：系统性综述

专知会员服务

12+阅读 · 6月26日

巡飞弹与反无人机系统——现代战场的两大支柱

巡飞弹与反无人机系统——现代战场的两大支柱

专知会员服务

5+阅读 · 6月26日

《打造“黄金舰队”》57页报告

《打造“黄金舰队”》57页报告

专知会员服务

4+阅读 · 6月26日

《北约数字教官网络发展路径》128页报告

《北约数字教官网络发展路径》128页报告

专知会员服务

3+阅读 · 6月26日

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

专知会员服务

7+阅读 · 6月25日

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

专知会员服务

6+阅读 · 6月25日

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

专知会员服务

10+阅读 · 6月25日

网状网络及其在军事领域的运用

网状网络及其在军事领域的运用

专知会员服务

8+阅读 · 6月25日

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

专知会员服务

9+阅读 · 6月25日

无美国参与的欧洲战争方式（万字长文）

无美国参与的欧洲战争方式（万字长文）

专知会员服务

8+阅读 · 6月25日

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

专知会员服务

10+阅读 · 6月25日

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

专知会员服务

9+阅读 · 6月25日

相关VIP内容

【2023新书】使用Python进行统计和数据可视化，554页pdf

【2023新书】使用Python进行统计和数据可视化，554页pdf

专知会员服务

130+阅读 · 2023年1月29日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

128+阅读 · 2020年11月20日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【电子书推荐】Data Science with Python and Dask

【电子书推荐】Data Science with Python and Dask

专知会员服务

44+阅读 · 2019年6月1日

热门VIP内容

开通专知VIP会员享更多权益服务

ICML 2026 | 自回归Boltzmann生成器重塑分子采样

无人机自主控制与人工智能：系统性综述

现代战争的隐蔽系统：伊朗战争十大启示

GNN跨域综述：从消息传递到图基础模型

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

A Class of Models for Large Zero-inflated Spatial Data

Arxiv

0+阅读 · 2023年4月5日

Partitioning Hypergraphs is Hard: Models, Inapproximability, and Applications

Arxiv

0+阅读 · 2023年4月5日

Many Data: Combine Experimental and Observational Data through a Power Likelihood

Arxiv

0+阅读 · 2023年4月5日

Learning from data with structured missingness

Arxiv

0+阅读 · 2023年4月4日

Data-graph repairs: the preferred approach

Arxiv

0+阅读 · 2023年4月3日

A Revenue Function for Comparison-Based Hierarchical Clustering

Arxiv

0+阅读 · 2023年4月2日

Factorization of Multi-Agent Sampling-Based Motion Planning

Arxiv

0+阅读 · 2023年4月1日

Efficiently transporting average treatment effects using a sufficient subset of effect modifiers

Arxiv

0+阅读 · 2023年3月31日

Updating Embeddings for Dynamic Knowledge Graphs

Arxiv

20+阅读 · 2021年9月22日

Contrastive Clustering

Arxiv

31+阅读 · 2020年9月21日

相关基金

ARB抑制miR-193a表达促进早期糖尿病肾病壁层上皮细胞-足细胞转分化研究

国家自然科学基金

0+阅读 · 2015年12月31日

重离子储存环CSRe上激光冷却相对论能量类锂12C3+离子束的实验研究

国家自然科学基金

0+阅读 · 2015年12月31日

Heisenberg 群上的 k-平面变换

国家自然科学基金

0+阅读 · 2015年12月31日

关于 Finsler 流形上调和映射与 Laplacian 的若干问题研究

国家自然科学基金

1+阅读 · 2014年12月31日

TMS1基因响应高温胁迫和ER Stress的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

拟南芥钙依赖型蛋白激酶CPK32响应低氮胁迫的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

幂零李群上热核估计的几个问题

国家自然科学基金

0+阅读 · 2012年12月31日

As(III)在二氧化钛表面的光促吸附机制

国家自然科学基金

0+阅读 · 2012年12月31日

原子转移自由基聚合模板合成尺寸精确可控的Ge,GeO2纳米材料及应用

国家自然科学基金

0+阅读 · 2011年12月31日

含氮石墨烯作为电催化材料的构筑与性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员