Automated clinical coding using off-the-shelf large language models

The task of assigning diagnostic ICD codes to patient hospital admissions is typically performed by expert human coders. Efforts towards automated ICD coding are dominated by supervised deep learning models. However, difficulties in learning to predict the large number of rare codes remain a barrier to adoption in clinical practice. In this work, we leverage off-the-shelf pre-trained generative large language models (LLMs) to develop a practical solution that is suitable for zero-shot and few-shot code assignment. Unsupervised pre-training alone does not guarantee precise knowledge of the ICD ontology and specialist clinical coding task, therefore we frame the task as information extraction, providing a description of each coded concept and asking the model to retrieve related mentions. For efficiency, rather than iterating over all codes, we leverage the hierarchical nature of the ICD ontology to sparsely search for relevant codes. Then, in a second stage, which we term 'meta-refinement', we utilise GPT-4 to select a subset of the relevant labels as predictions. We validate our method using Llama-2, GPT-3.5 and GPT-4 on the CodiEsp dataset of ICD-coded clinical case documents. Our tree-search method achieves state-of-the-art performance on rarer classes, achieving the best macro-F1 of 0.225, whilst achieving slightly lower micro-F1 of 0.157, compared to 0.216 and 0.219 respectively from PLM-ICD. To the best of our knowledge, this is the first method for automated ICD coding requiring no task-specific learning.

翻译：将患者住院期间的诊断ICD代码分配任务通常由专业人工编码员完成。自动化ICD编码的研究主要依赖监督式深度学习模型，然而，在预测大量罕见代码时存在的学习困难，仍是其临床实践应用中的障碍。本研究利用现成的预训练生成式大型语言模型，开发了一种适用于零样本和少样本代码分配的实用解决方案。无监督预训练本身无法保证模型对ICD本体及专业临床编码任务的精确认知，因此我们将该任务重构为信息抽取问题：为每个编码概念提供描述，并引导模型检索相关提及内容。为提高效率，我们利用ICD本体的层级结构，以稀疏搜索方式定位相关代码，而非遍历所有代码。在第二阶段（我们称之为“元精炼”），我们使用GPT-4从相关标签中筛选子集作为预测结果。我们在包含ICD编码临床病例文档的CodiEsp数据集上，使用Llama-2、GPT-3.5和GPT-4验证了该方法。我们的树搜索方法在罕见类别上实现了最先进性能，宏F1值达到0.225，同时微F1值为0.157（略低于PLM-ICD的0.216和0.219）。据我们所知，这是首个无需任务特定学习的自动化ICD编码方法。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日