The Electronic Health Record (EHR) is an essential part of the modern medical system and impacts healthcare delivery, operations, and research. Unstructured text is attracting much attention despite structured information in the EHRs and has become an exciting research field. The success of the recent neural Natural Language Processing (NLP) method has led to a new direction for processing unstructured clinical notes. In this work, we create a python library for clinical texts, EHRKit. This library contains two main parts: MIMIC-III-specific functions and tasks specific functions. The first part introduces a list of interfaces for accessing MIMIC-III NOTEEVENTS data, including basic search, information retrieval, and information extraction. The second part integrates many third-party libraries for up to 12 off-shelf NLP tasks such as named entity recognition, summarization, machine translation, etc.
翻译:电子健康记录(EHR)是现代医疗体系的重要组成部分,影响着医疗服务的提供、运营和研究。尽管EHR中包含结构化信息,但非结构化文本正日益受到关注,并已成为一个令人兴奋的研究领域。近期基于神经网络的自然语言处理(NLP)方法的成功,为处理非结构化临床记录开辟了新方向。在本工作中,我们创建了一个针对临床文本的Python库:EHRKit。该库包含两大部分:MIMIC-III专用函数和任务专用函数。第一部分提供了一系列用于访问MIMIC-III NOTEEVENTS数据的接口,包括基础搜索、信息检索和信息抽取。第二部分集成了多达12个现成NLP任务的第三方库,例如命名实体识别、文本摘要、机器翻译等。