Automated clinical coding using off-the-shelf large language models

The task of assigning diagnostic ICD codes to patient hospital admissions is typically performed by expert human coders. Efforts towards automated ICD coding are dominated by supervised deep learning models. However, difficulties in learning to predict the large number of rare codes remain a barrier to adoption in clinical practice. In this work, we leverage off-the-shelf pre-trained generative large language models (LLMs) to develop a practical solution that is suitable for zero-shot and few-shot code assignment. Unsupervised pre-training alone does not guarantee precise knowledge of the ICD ontology and specialist clinical coding task, therefore we frame the task as information extraction, providing a description of each coded concept and asking the model to retrieve related mentions. For efficiency, rather than iterating over all codes, we leverage the hierarchical nature of the ICD ontology to sparsely search for relevant codes. Then, in a second stage, which we term 'meta-refinement', we utilise GPT-4 to select a subset of the relevant labels as predictions. We validate our method using Llama-2, GPT-3.5 and GPT-4 on the CodiEsp dataset of ICD-coded clinical case documents. Our tree-search method achieves state-of-the-art performance on rarer classes, achieving the best macro-F1 of 0.225, whilst achieving slightly lower micro-F1 of 0.157, compared to 0.216 and 0.219 respectively from PLM-ICD. To the best of our knowledge, this is the first method for automated ICD coding requiring no task-specific learning.

翻译：将患者入院诊断分配ICD代码的任务通常由专业人类编码员完成。自动化ICD编码的研究主要由监督式深度学习模型主导，但学习预测大量罕见代码的困难仍阻碍其在临床实践中的应用。本研究利用现成的预训练生成式大型语言模型（LLMs），开发出适用于零样本和少样本代码分配的实用解决方案。由于无监督预训练本身无法保证对ICD本体论和专业化临床编码任务的精确认知，我们将任务构建为信息抽取问题：提供每个编码概念描述，并引导模型检索相关提及内容。为提高效率，我们利用ICD本体论的层级结构进行稀疏式相关代码搜索，而非遍历所有代码。随后在第二阶段（称为"元精炼"），利用GPT-4从相关标签中筛选出预测子集。我们在包含ICD编码临床病例文档的CodiEsp数据集上，使用Llama-2、GPT-3.5和GPT-4验证了该方法。我们的树搜索方法在罕见类别上实现了最优性能，宏平均F1值达0.225；而微平均F1值为0.157，略低于PLM-ICD的0.216和0.219。据我们所知，这是首个无需任务特定学习的自动化ICD编码方法。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日