Effectively addressing missing values in data imputation is pivotal, particularly for intricate datasets. This study delves into the full information maximum likelihood (FIML) optimized self-attention (FOSA) framework, an innovative approach that amalgamates the strengths of FIML estimation with the capabilities of self-attention neural networks. Our methodology begins with an initial estimation of missing values via FIML, which is subsequently refined by leveraging the self-attention mechanism. Our comprehensive experiments on both simulated and real-world datasets underscore the pronounced advantages of FOSA over traditional FIML techniques, including encapsulating facets of accuracy, computational efficiency, and adaptability to diverse data structures. Intriguingly, even in cases where the structural equation model can be misspecified, leading to sub-optimal FIML estimates, the robust architecture of the FOSA self-attention component adeptly rectifies and optimizes the imputation outcomes. Our empirical tests reveal that FOSA consistently delivers commendable predictions even for approximately 40% random missingness, highlighting its robustness and potential for wide-scale applications in data imputation.
翻译:有效处理数据填补中的缺失值至关重要,尤其对于复杂数据集而言。本研究深入探讨了全信息最大似然优化自注意力框架——一种融合全信息最大似然估计优势与自注意力神经网络能力的新型方法。该方法首先通过全信息最大似然对缺失值进行初始估计,随后利用自注意力机制加以细化。我们在模拟数据集和真实数据集上开展的全面实验表明,FOSA相较于传统全信息最大似然方法具有显著优势,涵盖精度、计算效率及对不同数据结构适应性等多个方面。有趣的是,即使结构方程模型设定存在偏差导致全信息最大似然估计结果欠优,FOSA自注意力组件的稳健架构仍能有效修正并优化填补结果。实证测试表明,即使面对约40%的随机缺失数据,FOSA仍能持续输出可靠的预测结果,凸显了其在数据填补领域的高度稳健性与广泛应用潜力。