To overcome the limitations of manual administrative coding in geriatric Cardiovascular Risk Management, this study introduces an automated classification framework leveraging unstructured Electronic Health Records (EHRs). Using a dataset of 3,482 patients, we benchmarked three distinct modeling paradigms on longitudinal Dutch clinical narratives: classical machine learning baselines, specialized deep learning architectures optimized for large-context sequences, and general-purpose generative Large Language Models (LLMs) in a zero-shot setting. Additionally, we evaluated a late fusion strategy to integrate unstructured text with structured medication embeddings and anthropometric data. Our analysis reveals that the custom Transformer architecture outperforms both traditional methods and generative \acs{llm}s, achieving the highest F1-scores and Matthews Correlation Coefficients. These findings underscore the critical role of specialized hierarchical attention mechanisms in capturing long-range dependencies within medical texts, presenting a robust, automated alternative to manual workflows for clinical risk stratification.
翻译:为克服老年心血管风险管理中手工管理编码的局限性,本研究提出了一种利用非结构化电子健康记录(EHRs)的自动分类框架。基于3,482名患者的数据集,我们在纵向荷兰临床叙事文本上对三种不同的建模范式进行了基准测试:经典机器学习基线、针对大规模上下文序列优化的专用深度学习架构,以及零样本设置下的通用生成式大型语言模型(LLMs)。此外,我们评估了一种将非结构化文本与结构化药物嵌入及人体测量数据相融合的后期融合策略。分析表明,定制的Transformer架构在F1分数和马修斯相关系数上均优于传统方法与生成式LLMs。这些发现凸显了专用分层注意力机制在捕捉医学文本中长期依赖关系中的关键作用,为临床风险分层提供了一种可靠的手工工作流自动化替代方案。