The combined growth of available data and their unstructured nature has received increased interest in natural language processing (NLP) techniques to make value of these data assets since this format is not suitable for statistical analysis. This work presents a systematic literature review of state-of-the-art advances using transformer-based methods on electronic medical records (EMRs) in different NLP tasks. To the best of our knowledge, this work is unique in providing a comprehensive review of research on transformer-based methods for NLP applied to the EMR field. In the initial query, 99 articles were selected from three public databases and filtered into 65 articles for detailed analysis. The papers were analyzed with respect to the business problem, NLP task, models and techniques, availability of datasets, reproducibility of modeling, language, and exchange format. The paper presents some limitations of current research and some recommendations for further research.
翻译:随着可用数据的增长及其非结构化特性,自然语言处理(NLP)技术因能从这些数据资产中提取价值而受到日益关注,因为此类格式不适合进行统计分析。本研究对基于Transformer方法在不同NLP任务中应用于电子病历(EMR)的最前沿进展进行了系统性文献综述。据我们所知,本研究首次提供了针对EMR领域NLP任务中基于Transformer方法的全面研究综述。初始检索从三个公开数据库中筛选出99篇文献,经筛选后对65篇文献进行了详细分析。这些文献从业务问题、NLP任务、模型与技术、数据集可用性、模型可复现性、语言及交换格式等维度进行了剖析。本文指出了当前研究的局限性,并提出了进一步研究的建议。