Extracting meaningful drug-related information chunks, such as adverse drug events (ADE), is crucial for preventing morbidity and saving many lives. Most ADEs are reported via an unstructured conversation with the medical context, so applying a general entity recognition approach is not sufficient enough. In this paper, we propose a new multi-aspect cross-integration framework for drug entity/event detection by capturing and aligning different context/language/knowledge properties from drug-related documents. We first construct multi-aspect encoders to describe semantic, syntactic, and medical document contextual information by conducting those slot tagging tasks, main drug entity/event detection, part-of-speech tagging, and general medical named entity recognition. Then, each encoder conducts cross-integration with other contextual information in three ways: the key-value cross, attention cross, and feedforward cross, so the multi-encoders are integrated in depth. Our model outperforms all SOTA on two widely used tasks, flat entity detection and discontinuous event extraction.
翻译:从非结构化医疗对话中抽取药物相关有意义信息片段(如药物不良事件ADE)对于预防疾病和挽救生命至关重要。由于大多数ADE通过医疗语境下的非结构化对话报告,直接应用通用实体识别方法效果有限。本文提出一种新颖的多方面交叉整合框架,通过捕捉并对齐药物相关文档中的不同语境/语言/知识特征,实现药物实体/事件检测。我们首先构建多方面编码器,通过执行槽位标注任务(主要药物实体/事件检测、词性标注及通用医学命名实体识别)来描述语义、句法和医学文档语境信息。随后,每个编码器通过三种方式与其他语境信息进行交叉整合:键值交叉、注意力交叉和前馈交叉,从而实现多编码器的深度整合。在两个广泛使用的任务(扁平实体检测与非连续事件抽取)中,我们的模型均超越所有现有最佳方法。