Iterative missing value imputation based on feature importance

Many datasets suffer from missing values due to various reasons,which not only increases the processing difficulty of related tasks but also reduces the accuracy of classification. To address this problem, the mainstream approach is to use missing value imputation to complete the dataset. Existing imputation methods estimate the missing parts based on the observed values in the original feature space, and they treat all features as equally important during data completion, while in fact different features have different importance. Therefore, we have designed an imputation method that considers feature importance. This algorithm iteratively performs matrix completion and feature importance learning, and specifically, matrix completion is based on a filling loss that incorporates feature importance. Our experimental analysis involves three types of datasets: synthetic datasets with different noisy features and missing values, real-world datasets with artificially generated missing values, and real-world datasets originally containing missing values. The results on these datasets consistently show that the proposed method outperforms the existing five imputation algorithms.To the best of our knowledge, this is the first work that considers feature importance in the imputation model.

翻译：许多数据集因各种原因存在缺失值，这不仅增加了相关任务的处理难度，还降低了分类的准确性。为解决此问题，主流方法采用缺失值插补来补全数据集。现有插补方法基于原始特征空间中的观测值估计缺失部分，且在数据补全过程中将所有特征视为同等重要，而实际不同特征的重要性存在差异。因此，我们设计了一种考虑特征重要性的插补方法。该算法迭代执行矩阵补全与特征重要性学习，具体而言，矩阵补全基于融入特征重要性的填充损失函数。我们的实验分析涉及三类数据集：含不同噪声特征与缺失值的合成数据集、人工生成缺失值的真实数据集以及原始就包含缺失值的真实数据集。实验结果表明，所提方法在所有数据集上均优于现有五种插补算法。据我们所知，这是首个在插补模型中考虑特征重要性的研究工作。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【ICML2023】无消息传递的transformer图归纳偏差

专知会员服务

26+阅读 · 2023年6月1日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2021】基于隐含结构推理网络的事件因果关系识别

专知会员服务

52+阅读 · 2021年8月13日