Generating Medical Prescriptions with Conditional Transformer

Access to real-world medication prescriptions is essential for medical research and healthcare quality improvement. However, access to real medication prescriptions is often limited due to the sensitive nature of the information expressed. Additionally, manually labelling these instructions for training and fine-tuning Natural Language Processing (NLP) models can be tedious and expensive. We introduce a novel task-specific model architecture, Label-To-Text-Transformer (\textbf{LT3}), tailored to generate synthetic medication prescriptions based on provided labels, such as a vocabulary list of medications and their attributes. LT3 is trained on a set of around 2K lines of medication prescriptions extracted from the MIMIC-III database, allowing the model to produce valuable synthetic medication prescriptions. We evaluate LT3's performance by contrasting it with a state-of-the-art Pre-trained Language Model (PLM), T5, analysing the quality and diversity of generated texts. We deploy the generated synthetic data to train the SpacyNER model for the Named Entity Recognition (NER) task over the n2c2-2018 dataset. The experiments show that the model trained on synthetic data can achieve a 96-98\% F1 score at Label Recognition on Drug, Frequency, Route, Strength, and Form. LT3 codes and data will be shared at \url{https://github.com/HECTA-UoM/Label-To-Text-Transformer}

翻译：获取真实的药物治疗处方对医学研究和医疗质量改进至关重要。然而，由于处方信息具有敏感性，真实处方的获取往往受到限制。此外，为训练和微调自然语言处理（NLP）模型而手动标注这些指令既繁琐又成本高昂。我们提出了一种新颖的任务特定模型架构——标签到文本Transformer（\textbf{LT3}），该架构能够根据提供的标签（例如药物及其属性词汇表）生成合成药物治疗处方。LT3基于从MIMIC-III数据库中提取的约2000条药物治疗处方进行训练，使其能够生成有价值的合成处方。我们通过将LT3与当前最先进的预训练语言模型（PLM）T5进行对比，分析生成文本的质量与多样性，从而评估其性能。我们将生成的合成数据用于训练SpacyNER模型，在n2c2-2018数据集上执行命名实体识别（NER）任务。实验表明，基于合成数据训练的模型在药物、频率、给药途径、剂量强度和剂型的标签识别任务中，F1分数可达96-98%。LT3的代码与数据将在\url{https://github.com/HECTA-UoM/Label-To-Text-Transformer}共享。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日