Background: The semantics of entities extracted from a clinical text can be dramatically altered by modifiers, including entity negation, uncertainty, conditionality, severity, and subject. Existing models for determining modifiers of clinical entities involve regular expression or features weights that are trained independently for each modifier. Methods: We develop and evaluate a multi-task transformer architecture design where modifiers are learned and predicted jointly using the publicly available SemEval 2015 Task 14 corpus and a new Opioid Use Disorder (OUD) data set that contains modifiers shared with SemEval as well as novel modifiers specific for OUD. We evaluate the effectiveness of our multi-task learning approach versus previously published systems and assess the feasibility of transfer learning for clinical entity modifiers when only a portion of clinical modifiers are shared. Results: Our approach achieved state-of-the-art results on the ShARe corpus from SemEval 2015 Task 14, showing an increase of 1.1% on weighted accuracy, 1.7% on unweighted accuracy, and 10% on micro F1 scores. Conclusions: We show that learned weights from our shared model can be effectively transferred to a new partially matched data set, validating the use of transfer learning for clinical text modifiers
翻译:背景:临床文本中提取的实体语义可能被修饰语(包括实体否定、不确定性、条件性、严重程度及主体)显著改变。现有临床实体修饰语判定模型采用正则表达式或特征权重,且各修饰语需独立训练。方法:我们设计并评估了一种多任务Transformer架构,利用公开的SemEval 2015任务14语料库及新构建的阿片类药物使用障碍(OUD)数据集(该数据集包含与SemEval共享的修饰语及OUD特有的新型修饰语),实现修饰语的联合学习与预测。通过对比先前发表系统的效果,评估多任务学习方法的有效性,并验证在仅共享部分临床修饰语时迁移学习的可行性。结果:本方法在SemEval 2015任务14的ShARe语料库上取得最优结果,加权准确率提升1.1%,未加权准确率提升1.7%,微平均F1分数提升10%。结论:本研究证明共享模型习得的权重可有效迁移至新部分匹配数据集,验证了迁移学习在临床文本修饰语处理中的应用价值。