Event extraction for the clinical domain is an under-explored research area. The lack of training data along with the high volume of domain-specific terminologies with vague entity boundaries makes the task especially challenging. In this paper, we introduce DICE, a robust and data-efficient generative model for clinical event extraction. DICE frames event extraction as a conditional generation problem and introduces a contrastive learning objective to accurately decide the boundaries of biomedical mentions. DICE also trains an auxiliary mention identification task jointly with event extraction tasks to better identify entity mention boundaries, and further introduces special markers to incorporate identified entity mentions as trigger and argument candidates for their respective tasks. To benchmark clinical event extraction, we compose MACCROBAT-EE, the first clinical event extraction dataset with argument annotation, based on an existing clinical information extraction dataset MACCROBAT. Our experiments demonstrate state-of-the-art performances of DICE for clinical and news domain event extraction, especially under low data settings.
翻译:临床领域的事件抽取是一个研究不足的领域。训练数据的缺乏,加上大量边界模糊的领域特定术语,使得该任务极具挑战性。本文提出DICE,一种鲁棒且数据高效的临床事件抽取生成模型。DICE将事件抽取建模为条件生成问题,并引入对比学习目标以准确判定生物医学提及的边界。DICE还联合事件抽取任务训练一个辅助提及识别任务,以更好地识别实体提及边界,并进一步引入特殊标记,将已识别的实体提及作为各自任务的触发词和论元候选。为对临床事件抽取进行基准测试,我们基于现有临床信息抽取数据集MACCROBAT构建了MACCROBAT-EE——首个带有论元标注的临床事件抽取数据集。实验表明,DICE在临床和新闻领域的事件抽取中,特别是在低数据设置下,取得了最佳性能。