Early identification of Adverse Drug Events (ADE) is critical for taking prompt actions while introducing new drugs into the market. These ADEs information are available through various unstructured data sources like clinical study reports, patient health records, social media posts, etc. Extracting ADEs and the related suspect drugs using machine learning is a challenging task due to the complex linguistic relations between drug ADE pairs in textual data and unavailability of large corpus of labelled datasets. This paper introduces ADEQA, a question-answer(QA) based approach using quasi supervised labelled data and sequence-to-sequence transformers to extract ADEs, drug suspects and the relationships between them. Unlike traditional QA models, natural language generation (NLG) based models don't require extensive token level labelling and thereby reduces the adoption barrier significantly. On a public ADE corpus, we were able to achieve state-of-the-art results with an F1 score of 94% on establishing the relationships between ADEs and the respective suspects.
翻译:药物不良反应(ADE)的早期识别对于新药上市后及时采取行动至关重要。这些ADE信息可通过临床研究报告、患者健康记录、社交媒体帖子等多种非结构化数据源获取。由于文本数据中药-ADE对之间存在复杂的语言关系,且缺乏大规模标注数据集,利用机器学习提取ADE及相关可疑药物是一项具有挑战性的任务。本文提出ADEQA,一种基于问答(QA)的方法,利用准监督标注数据和序列到序列Transformer来提取ADE、可疑药物及其相互关系。与传统QA模型不同,基于自然语言生成(NLG)的模型无需大量词元级标注,从而显著降低了应用门槛。在公开ADE语料库上,我们在建立ADE与对应可疑药物关系方面取得了94% F1分数的先进结果。