Clinical free-text notes contain vital patient information. They are structured into labelled sections; recognizing these sections has been shown to support clinical decision-making and downstream NLP tasks. In this paper, we advance clinical section segmentation through three key contributions. First, we curate a new de-identified, section-labeled obstetrics notes dataset, to supplement the medical domains covered in public corpora such as MIMIC-III, on which most existing segmentation approaches are trained. Second, we systematically evaluate transformer-based supervised models for section segmentation on a curated subset of MIMIC-III (in-domain), and on the new obstetrics dataset (out-of-domain). Third, we conduct the first head-to-head comparison of supervised models for medical section segmentation with zero-shot large language models. Our results show that while supervised models perform strongly in-domain, their performance drops substantially out-of-domain. In contrast, zero-shot models demonstrate robust out-of-domain adaptability once hallucinated section headers are corrected. These findings underscore the importance of developing domain-specific clinical resources and highlight zero-shot segmentation as a promising direction for applying healthcare NLP beyond well-studied corpora, as long as hallucinations are appropriately managed.
翻译:临床自由文本记录包含关键的患者信息。这些记录被组织成带有标签的章节;识别这些章节已被证明能够支持临床决策和下游自然语言处理任务。本文通过三项关键贡献推进临床章节分割研究。首先,我们整理了一个新的去标识化、带章节标签的产科记录数据集,以补充如MIMIC-III等现有公开语料库所覆盖的医学领域——当前大多数分割方法均基于此类语料库进行训练。其次,我们在精心筛选的MIMIC-III子集(领域内)及新的产科数据集(领域外)上,系统评估了基于Transformer的监督模型在章节分割任务上的表现。第三,我们首次对医疗章节分割的监督模型与零样本大语言模型进行了直接比较。结果表明,尽管监督模型在领域内表现优异,但其在领域外的性能显著下降。相比之下,零样本模型在纠正幻觉生成的章节标题后,展现出强大的领域外适应能力。这些发现强调了开发领域特异性临床资源的重要性,并指出只要妥善处理幻觉问题,零样本分割是将医疗自然语言处理技术应用于已充分研究语料库之外领域的一个有前景的方向。