Adverse Event (ADE) extraction is one of the core tasks in digital pharmacovigilance, especially when applied to informal texts. This task has been addressed by the Natural Language Processing community using large pre-trained language models, such as BERT. Despite the great number of Transformer-based architectures used in the literature, it is unclear which of them has better performances and why. Therefore, in this paper we perform an extensive evaluation and analysis of 19 Transformer-based models for ADE extraction on informal texts. We compare the performance of all the considered models on two datasets with increasing levels of informality (forums posts and tweets). We also combine the purely Transformer-based models with two commonly-used additional processing layers (CRF and LSTM), and analyze their effect on the models performance. Furthermore, we use a well-established feature importance technique (SHAP) to correlate the performance of the models with a set of features that describe them: model category (AutoEncoding, AutoRegressive, Text-to-Text), pretraining domain, training from scratch, and model size in number of parameters. At the end of our analyses, we identify a list of take-home messages that can be derived from the experimental data.
翻译:不良事件(ADE)抽取是数字药物警戒中的核心任务之一,尤其在处理非正式文本时尤为关键。自然语言处理领域已通过使用大型预训练语言模型(如BERT)来解决该任务。尽管文献中采用了大量基于Transformer的架构,但尚不清楚哪种架构性能更优及其原因。因此,本文对19种基于Transformer的模型在非正式文本上的ADE抽取性能进行了广泛评估与分析。我们在两个非正式程度递增的数据集(论坛帖子和推文)上比较了所有模型的性能。同时,我们将纯Transformer模型与两种常用附加处理层(CRF和LSTM)结合,分析其对模型性能的影响。此外,我们利用成熟的特征重要性技术(SHAP)将模型性能与一组描述性特征(模型类别:自编码、自回归、文本到文本;预训练领域;从头训练;参数规模)相关联。通过分析,我们总结出从实验数据中得出的若干关键结论。