MISNN: Multiple Imputation via Semi-parametric Neural Networks

Multiple imputation (MI) has been widely applied to missing value problems in biomedical, social and econometric research, in order to avoid improper inference in the downstream data analysis. In the presence of high-dimensional data, imputation models that include feature selection, especially $\ell_1$ regularized regression (such as Lasso, adaptive Lasso, and Elastic Net), are common choices to prevent the model from underdetermination. However, conducting MI with feature selection is difficult: existing methods are often computationally inefficient and poor in performance. We propose MISNN, a novel and efficient algorithm that incorporates feature selection for MI. Leveraging the approximation power of neural networks, MISNN is a general and flexible framework, compatible with any feature selection method, any neural network architecture, high/low-dimensional data and general missing patterns. Through empirical experiments, MISNN has demonstrated great advantages over state-of-the-art imputation methods (e.g. Bayesian Lasso and matrix completion), in terms of imputation accuracy, statistical consistency and computation speed.

翻译：摘要：多重插补（MI）已广泛应用于生物医学、社会科学和计量经济学研究中的缺失值问题，以避免下游数据分析中的不当推断。在高维数据场景下，包含特征选择的插补模型（尤其是基于ℓ1正则化回归的方法，如Lasso、自适应Lasso和弹性网络）常被用于防止模型欠定。然而，在MI中融入特征选择存在困难：现有方法通常计算效率低下且性能不佳。本文提出MISNN——一种新颖高效的融合特征选择的MI算法。借助神经网络的逼近能力，MISNN构建了一个通用且灵活的框架，可兼容任意特征选择方法、任意神经网络架构、高/低维数据及一般缺失模式。实验结果表明，MISNN在插补精度、统计一致性和计算速度方面均显著优于现有最优插补方法（如贝叶斯Lasso和矩阵补全）。

相关内容

特征选择

关注 5940

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

116+阅读 · 2020年4月5日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日