Computerised clinical coding approaches aim to automate the process of assigning a set of codes to medical records. While there is active research pushing the state of the art on clinical coding for hospitalized patients, the outpatient setting -- where doctors tend to non-hospitalised patients -- is overlooked. Although both settings can be formalised as a multi-label classification task, they present unique and distinct challenges, which raises the question of whether the success of inpatient clinical coding approaches translates to the outpatient setting. This paper is the first to investigate how well state-of-the-art deep learning-based clinical coding approaches work in the outpatient setting at hospital scale. To this end, we collect a large outpatient dataset comprising over 7 million notes documenting over half a million patients. We adapt four state-of-the-art clinical coding approaches to this setting and evaluate their potential to assist coders. We find evidence that clinical coding in outpatient settings can benefit from more innovations in popular inpatient coding benchmarks. A deeper analysis of the factors contributing to the success -- amount and form of data and choice of document representation -- reveals the presence of easy-to-solve examples, the coding of which can be completely automated with a low error rate.
翻译:计算机化临床编码方法旨在自动化将一组代码分配给医疗记录的过程。尽管针对住院患者的临床编码技术前沿已有活跃研究,但面向医生接诊非住院患者的门诊环境却被忽视。虽然两种场景均可形式化为多标签分类任务,但它们呈现独特且截然不同的挑战,这引发了住院临床编码方法的成功能否迁移至门诊环境的疑问。本文首次系统探究最先进的基于深度学习的临床编码方法在医院规模的门诊环境中的表现。为此,我们收集了一个包含超过700万份病历记录、涵盖50余万患者的大型门诊数据集。我们针对该环境改进了四种前沿临床编码方法,并评估其辅助编码员的潜力。实验证据表明,门诊环境中的临床编码可从更流行的住院编码基准创新中受益。对成功因素(数据数量、形式及文档表示选择)的深入分析揭示了存在易于解决的示例,其编码可完全自动化且实现低错误率。