Computerised clinical coding approaches aim to automate the process of assigning a set of codes to medical records. While there is active research pushing the state of the art on clinical coding for hospitalized patients, the outpatient setting -- where doctors tend to non-hospitalised patients -- is overlooked. Although both settings can be formalised as a multi-label classification task, they present unique and distinct challenges, which raises the question of whether the success of inpatient clinical coding approaches translates to the outpatient setting. This paper is the first to investigate how well state-of-the-art deep learning-based clinical coding approaches work in the outpatient setting at hospital scale. To this end, we collect a large outpatient dataset comprising over 7 million notes documenting over half a million patients. We adapt four state-of-the-art clinical coding approaches to this setting and evaluate their potential to assist coders. We find evidence that clinical coding in outpatient settings can benefit from more innovations in popular inpatient coding benchmarks. A deeper analysis of the factors contributing to the success -- amount and form of data and choice of document representation -- reveals the presence of easy-to-solve examples, the coding of which can be completely automated with a low error rate.
翻译:计算机化临床编码方法旨在自动化对医疗记录分配代码集的过程。尽管针对住院患者的临床编码研究不断推动技术前沿,但面向非住院患者的门诊环境却被忽视。虽然两种场景均可形式化为多标签分类任务,但它们各自具有独特且截然不同的挑战,这引发了一个问题:住院临床编码方法的成功经验能否迁移至门诊环境?本文首次在院级规模下系统探究基于深度学习的先进临床编码方法在门诊场景中的适用性。为此,我们收集了一个大型门诊数据集,包含超过700万份病历文档,覆盖50余万患者。我们调整了四种前沿临床编码方法以适应门诊场景,并评估其辅助编码人员的潜力。实验证据表明,门诊临床编码可从住院编码基准中更先进的创新技术中获益。对成功因素的深入分析——数据量与形式、文档表示选择——揭示了存在易于解决的示例,这些示例的编码可实现完全自动化且错误率较低。