Rapid and accurate identification of Venous thromboembolism (VTE), a severe cardiovascular condition including deep vein thrombosis (DVT) and pulmonary embolism (PE), is important for effective treatment. Leveraging Natural Language Processing (NLP) on radiology reports, automated methods have shown promising advancements in identifying VTE events from retrospective data cohorts or aiding clinical experts in identifying VTE events from radiology reports. However, effectively training Deep Learning (DL) and the NLP models is challenging due to limited labeled medical text data, the complexity and heterogeneity of radiology reports, and data imbalance. This study proposes novel method combinations of DL methods, along with data augmentation, adaptive pre-trained NLP model selection, and a clinical expert NLP rule-based classifier, to improve the accuracy of VTE identification in unstructured (free-text) radiology reports. Our experimental results demonstrate the model's efficacy, achieving an impressive 97\% accuracy and 97\% F1 score in predicting DVT, and an outstanding 98.3\% accuracy and 98.4\% F1 score in predicting PE. These findings emphasize the model's robustness and its potential to significantly contribute to VTE research.
翻译:静脉血栓栓塞症(VTE),一种包括深静脉血栓(DVT)和肺栓塞(PE)的严重心血管疾病,其快速准确识别对于有效治疗至关重要。利用放射学报告中的自然语言处理(NLP),自动化方法在从回顾性数据队列识别VTE事件或辅助临床专家从放射学报告中识别VTE事件方面展现出显著进展。然而,由于标注医学文本数据有限、放射学报告的复杂性与异质性以及数据不平衡,有效训练深度学习(DL)和NLP模型面临挑战。本研究提出DL方法、数据增强、自适应预训练NLP模型选择及临床专家NLP规则分类器的新型组合方法,以提升非结构化(自由文本)放射学报告中VTE识别的准确性。实验结果证明了模型的有效性:在DVT预测中达到97%的准确率和97%的F1分数,在PE预测中更取得了98.3%的准确率和98.4%的F1分数。这些发现强调了模型的鲁棒性及其对VTE研究的潜在重要贡献。