The healthcare environment is commonly referred to as "information-rich" but also "knowledge poor". Healthcare systems collect huge amounts of data from various sources: lab reports, medical letters, logs of medical tools or programs, medical prescriptions, etc. These massive sets of data can provide great knowledge and information that can improve the medical services, and overall the healthcare domain, such as disease prediction by analyzing the patient's symptoms or disease prevention, by facilitating the discovery of behavioral factors for diseases. Unfortunately, only a relatively small volume of the textual eHealth data is processed and interpreted, an important factor being the difficulty in efficiently performing Big Data operations. In the medical field, detecting domain-specific multi-word terms is a crucial task as they can define an entire concept with a few words. A term can be defined as a linguistic structure or a concept, and it is composed of one or more words with a specific meaning to a domain. All the terms of a domain create its terminology. This chapter offers a critical study of the current, most performant solutions for analyzing unstructured (image and textual) eHealth data. This study also provides a comparison of the current Natural Language Processing and Deep Learning techniques in the eHealth context. Finally, we examine and discuss some of the current issues, and we define a set of research directions in this area.
翻译:医疗环境通常被称为“信息丰富”但“知识贫乏”。医疗系统从各种来源(如化验报告、医疗信函、医疗器械或程序日志、处方等)收集海量数据。这些大规模数据集能够提供巨大知识量和信息,从而改善医疗服务乃至整个医疗领域,例如通过分析患者症状进行疾病预测,或通过促进疾病行为因素的发现实现疾病预防。然而,目前只有相对少量的电子健康文本数据被处理和解译,一个重要原因在于高效执行大数据操作存在困难。在医学领域,检测领域特定的多词术语是一项关键任务,因为这类术语能通过少数词汇定义整个概念。术语可定义为一种语言结构或概念,由一个或多个对特定领域具有特定含义的词汇组成。某一领域的所有术语构成了该领域的术语体系。本章对当前分析非结构化(图像和文本)电子健康数据的最优解决方案进行了批判性研究,还比较了当前自然语言处理和深度学习技术在电子健康领域的应用。最后,我们考察并讨论了当前的一些问题,并定义了该领域的一系列研究方向。