Addressing missing data in complex datasets including electronic health records (EHR) is critical for ensuring accurate analysis and decision-making in healthcare. This paper proposes dynamically adaptable structural equation modeling (SEM) using a self-attention method (SESA), an approach to data imputation in EHR. SESA innovates beyond traditional SEM-based methods by incorporating self-attention mechanisms, thereby enhancing model adaptability and accuracy across diverse EHR datasets. Such enhancement allows SESA to dynamically adjust and optimize imputation and overcome the limitations of static SEM frameworks. Our experimental analyses demonstrate the achievement of robust predictive SESA performance for effectively handling missing data in EHR. Moreover, the SESA architecture not only rectifies potential mis-specifications in SEM but also synergizes with causal discovery algorithms to refine its imputation logic based on underlying data structures. Such features highlight its capabilities and broadening applicational potential in EHR data analysis and beyond, marking a reasonable leap forward in the field of data imputation.
翻译:解决复杂数据集(包括电子健康记录)中的缺失数据对于确保医疗领域准确的分析与决策至关重要。本文提出了一种基于自注意力机制的动态自适应结构方程建模方法——自注意力结构方程模型,用于电子健康记录数据填补。该模型通过引入自注意力机制,突破了传统基于结构方程模型的填补方法,从而增强了模型在不同电子健康记录数据集上的适应性和准确性。这种增强使得SESA能够动态调整并优化填补过程,克服了静态结构方程模型框架的局限性。实验分析表明,SESA在有效处理电子健康记录缺失数据方面展现出稳健的预测性能。此外,SESA架构不仅能修正结构方程模型中潜在的错误设定,还能与因果发现算法协同作用,基于底层数据结构优化其填补逻辑。这些特性凸显了其在电子健康记录数据分析及其他领域的能力与广泛的应用潜力,标志着数据填补领域取得了重要突破。