This paper focuses on term-status pair extraction from medical dialogues (MD-TSPE), which is essential in diagnosis dialogue systems and the automatic scribe of electronic medical records (EMRs). In the past few years, works on MD-TSPE have attracted increasing research attention, especially after the remarkable progress made by generative methods. However, these generative methods output a whole sequence consisting of term-status pairs in one stage and ignore integrating prior knowledge, which demands a deeper understanding to model the relationship between terms and infer the status of each term. This paper presents a knowledge-enhanced two-stage generative framework (KTGF) to address the above challenges. Using task-specific prompts, we employ a single model to complete the MD-TSPE through two phases in a unified generative form: we generate all terms the first and then generate the status of each generated term. In this way, the relationship between terms can be learned more effectively from the sequence containing only terms in the first phase, and our designed knowledge-enhanced prompt in the second phase can leverage the category and status candidates of the generated term for status generation. Furthermore, our proposed special status ``not mentioned" makes more terms available and enriches the training data in the second phase, which is critical in the low-resource setting. The experiments on the Chunyu and CMDD datasets show that the proposed method achieves superior results compared to the state-of-the-art models in the full training and low-resource settings.
翻译:本文聚焦于医疗对话中的术语-状态对抽取任务(MD-TSPE),该任务在诊断对话系统和电子病历(EMR)自动记录中至关重要。近年来,MD-TSPE研究日益受到关注,尤其在生成式方法取得显著进展之后。然而,现有生成方法通常在一阶段直接输出包含术语-状态对的完整序列,且未能整合先验知识,这需要更深入理解以建模术语间关系并推断各术语的状态。本文提出一种知识增强的两阶段生成框架(KTGF)以应对上述挑战。通过任务特定提示,我们采用统一生成形式在单模型中分两阶段完成MD-TSPE:首先生成所有术语,随后生成每个术语的状态。如此,第一阶段仅包含术语的序列可更有效地学习术语间关系;而第二阶段中我们设计的知识增强提示能利用生成术语的类别与状态候选信息进行状态生成。此外,我们提出的特殊状态"未提及"(not mentioned)使更多术语可用,并丰富了第二阶段的训练数据,这在低资源场景中尤为关键。在春雨与CMDD数据集上的实验表明,所提方法在全训练与低资源设置下均优于现有最佳模型。