In recent years, estimating the duration of medical intervention based on electronic health records (EHRs) has gained significant attention in the filed of clinical decision support. However, current models largely focus on structured data, leaving out information from the unstructured clinical free-text data. To address this, we present a novel language-enhanced transformer-based framework, which projects all relevant clinical data modalities (continuous, categorical, binary, and free-text features) into a harmonized language latent space using a pre-trained sentence encoder with the help of medical prompts. The proposed method enables the integration of information from different modalities within the cell transformer encoder and leads to more accurate duration estimation for medical intervention. Our experimental results on both US-based (length of stay in ICU estimation) and Asian (surgical duration prediction) medical datasets demonstrate the effectiveness of our proposed framework, which outperforms tailored baseline approaches and exhibits robustness to data corruption in EHRs.
翻译:近年来,基于电子健康记录(EHRs)进行医疗干预时长估计在临床决策支持领域引起了广泛关注。然而,现有模型大多聚焦于结构化数据,忽略了非结构化临床自由文本数据中的信息。为解决这一问题,我们提出了一种新颖的语言增强型Transformer框架,该框架利用预训练句子编码器并借助医疗提示,将所有相关临床数据模态(连续型、类别型、二值型和自由文本特征)映射至统一的语言潜在空间。所提方法能够在单元Transformer编码器内实现跨模态信息融合,从而对医疗干预时长进行更精确的估计。我们在美国数据集(ICU住院时长估计)和亚洲数据集(手术时长预测)上的实验结果表明,该框架优于定制化基线方法,并对EHRs中的数据损坏表现出稳健性。