Automatic diagnosis (AD), a critical application of AI in healthcare, employs machine learning techniques to assist doctors in gathering patient symptom information for precise disease diagnosis. The Transformer-based method utilizes an input symptom sequence, predicts itself through auto-regression, and employs the hidden state of the final symptom to determine the disease. Despite its simplicity and superior performance demonstrated, a decline in disease diagnosis accuracy is observed caused by 1) a mismatch between symptoms observed during training and generation, and 2) the effect of different symptom orders on disease prediction. To address the above obstacles, we introduce the CoAD, a novel disease and symptom collaborative generation framework, which incorporates several key innovations to improve AD: 1) aligning sentence-level disease labels with multiple possible symptom inquiry steps to bridge the gap between training and generation; 2) expanding symptom labels for each sub-sequence of symptoms to enhance annotation and eliminate the effect of symptom order; 3) developing a repeated symptom input schema to effectively and efficiently learn the expanded disease and symptom labels. We evaluate the CoAD framework using four datasets, including three public and one private, and demonstrate that it achieves an average 2.3% improvement over previous state-of-the-art results in automatic disease diagnosis. For reproducibility, we release the code and data at https://github.com/KwanWaiChung/coad.
翻译:自动诊断(AD)作为人工智能在医疗领域的关键应用,利用机器学习技术辅助医生收集患者症状信息以实现精准疾病诊断。基于Transformer的方法采用输入症状序列,通过自回归方式预测自身,并利用最终症状的隐藏状态确定疾病。尽管该方法具有简洁性和优越性能,但因以下两个问题导致疾病诊断准确率下降:1)训练与生成阶段观察到的症状不匹配;2)不同症状顺序对疾病预测的影响。为克服上述障碍,我们提出CoAD——一种新型疾病与症状协同生成框架,该框架融合多项关键创新以改进自动诊断:1)将句子级别的疾病标签与多种可能的症状询问步骤对齐,弥合训练与生成阶段的差距;2)为每个症状子序列扩展症状标签以增强标注,消除症状顺序的影响;3)开发重复症状输入模式,以高效且有效地学习扩展后的疾病和症状标签。我们使用四个数据集(包括三个公开数据集和一个私有数据集)评估CoAD框架,证明其在自动疾病诊断中相较先前最先进方法平均提升2.3%。为便于复现,我们已在https://github.com/KwanWaiChung/coad 开源代码与数据。