Multiple imputation (MI) has been widely applied to missing value problems in biomedical, social and econometric research, in order to avoid improper inference in the downstream data analysis. In the presence of high-dimensional data, imputation models that include feature selection, especially $\ell_1$ regularized regression (such as Lasso, adaptive Lasso, and Elastic Net), are common choices to prevent the model from underdetermination. However, conducting MI with feature selection is difficult: existing methods are often computationally inefficient and poor in performance. We propose MISNN, a novel and efficient algorithm that incorporates feature selection for MI. Leveraging the approximation power of neural networks, MISNN is a general and flexible framework, compatible with any feature selection method, any neural network architecture, high/low-dimensional data and general missing patterns. Through empirical experiments, MISNN has demonstrated great advantages over state-of-the-art imputation methods (e.g. Bayesian Lasso and matrix completion), in terms of imputation accuracy, statistical consistency and computation speed.
翻译:摘要:多重插补(MI)已广泛应用于生物医学、社会科学和计量经济学研究中的缺失值问题,以避免下游数据分析中的不当推断。在高维数据场景下,包含特征选择的插补模型(尤其是基于ℓ1正则化回归的方法,如Lasso、自适应Lasso和弹性网络)常被用于防止模型欠定。然而,在MI中融入特征选择存在困难:现有方法通常计算效率低下且性能不佳。本文提出MISNN——一种新颖高效的融合特征选择的MI算法。借助神经网络的逼近能力,MISNN构建了一个通用且灵活的框架,可兼容任意特征选择方法、任意神经网络架构、高/低维数据及一般缺失模式。实验结果表明,MISNN在插补精度、统计一致性和计算速度方面均显著优于现有最优插补方法(如贝叶斯Lasso和矩阵补全)。