Extracting meaningful drug-related information chunks, such as adverse drug events (ADE), is crucial for preventing morbidity and saving many lives. Most ADE are reported via an unstructured conversation with the medical context. Hence, applying a general entity recognition approach is not sufficient enough. The key is how to integrate and align multiple crucial aspects to detect drug event information, including drug event semantics, syntactic structures, and medical domain terminology. In this paper, we propose a new multi-aspect cross-integration framework for drug entity/event detection by capturing and aligning different context/language/knowledge properties from drug-related documents. We first construct multi-aspect encoders to describe semantic, syntactic, and medical document contextual information by conducting those slot tagging tasks, main drug entity/event detection, part-of-speech tagging, and general medical named entity recognition. Then, each encoder conducts cross integration and alignment with other contextual information in three ways, including the key-value cross, attention cross, and feedforward cross, so the multi-encoders are integrated in depth. Then, we perform extensive experiments on two widely used drug-related entity recognition downstream tasks, flat entity detection and discontinuous event extraction. Our model significantly outperforms all recent twelve state-of-the-art models. The implementation code will be released at~\url{https://github.com/adlnlp/mc-dre}.
翻译:从非结构化的医疗对话中提取有意义的药物相关信息片段(如药物不良事件(ADE))对于预防疾病和挽救生命至关重要。大多数ADE通过包含医疗背景的非结构化对话进行报告。因此,仅采用通用实体识别方法是不够的。关键在于如何整合与对齐多个关键方面以检测药物事件信息,包括药物事件语义、句法结构及医学术语。本文提出了一种新的多方面交叉整合框架,通过捕捉和对齐药物相关文档中的不同上下文/语言/知识属性来实现药物实体/事件检测。我们首先构建多方面编码器,通过执行槽标记任务(主药物实体/事件检测、词性标注及通用医学命名实体识别)来描述语义、句法及医疗文档上下文信息。随后,每个编码器通过三种方式(键值交叉、注意力交叉与前馈交叉)与其他上下文信息进行交叉整合与对齐,从而实现多编码器的深度集成。我们在两个广泛使用的药物相关实体识别下游任务(平铺实体检测与不连续事件抽取)上进行了大量实验。我们的模型显著优于近期所有十二个最先进模型。实现代码将在~\url{https://github.com/adlnlp/mc-dre} 发布。