An adverse drug effect (ADE) is any harmful event resulting from medical drug treatment. Despite their importance, ADEs are often under-reported in official channels. Some research has therefore turned to detecting discussions of ADEs in social media. Impressive results have been achieved in various attempts to detect ADEs. In a high-stakes domain such as medicine, however, an in-depth evaluation of a model's abilities is crucial. We address the issue of thorough performance evaluation in English-language ADE detection with hand-crafted templates for four capabilities: Temporal order, negation, sentiment, and beneficial effect. We find that models with similar performance on held-out test sets have varying results on these capabilities.
翻译:药物不良效应(Adverse Drug Effect,ADE)是指药物治疗引发的任何有害事件。尽管其重要性不言而喻,但药物不良效应在官方渠道中常被漏报。因此,部分研究转向从社交媒体中检测关于药物不良效应的讨论。在检测药物不良效应的各种尝试中,已取得了令人瞩目的成果。然而,在医学这类高风险领域,对模型能力进行深入评估至关重要。我们通过手工构建的模板,针对时序关系、否定、情感及有益效应这四项能力,解决了英语药物不良效应检测中性能评估不够全面的问题。我们发现,在保留测试集上表现相近的模型,在这些能力上的评估结果却存在差异。