Automated medical coding, an essential task for healthcare operation and delivery, makes unstructured data manageable by predicting medical codes from clinical documents. Recent advances in deep learning and natural language processing have been widely applied to this task. However, deep learning-based medical coding lacks a unified view of the design of neural network architectures. This review proposes a unified framework to provide a general understanding of the building blocks of medical coding models and summarizes recent advanced models under the proposed framework. Our unified framework decomposes medical coding into four main components, i.e., encoder modules for text feature extraction, mechanisms for building deep encoder architectures, decoder modules for transforming hidden representations into medical codes, and the usage of auxiliary information. Finally, we introduce the benchmarks and real-world usage and discuss key research challenges and future directions.
翻译:自动医疗编码是医疗运营与交付中的关键任务,通过从临床文档中预测医疗代码实现非结构化数据的管理。近年来,深度学习与自然语言处理的进展被广泛应用于该任务。然而,基于深度学习的医疗编码在神经网络架构设计方面缺乏统一视角。本综述提出一个统一框架,以提供对医疗编码模型构建模块的通用理解,并在此框架下总结近期先进模型。该统一框架将医疗编码分解为四个主要组成部分:用于文本特征提取的编码器模块、构建深度编码器架构的机制、将隐藏表示转换为医疗代码的解码器模块,以及辅助信息的运用。最后,我们介绍基准测试和实际应用场景,并讨论关键研究挑战与未来方向。