Medical diagnosis assistant (MDA) aims to build an interactive diagnostic agent to sequentially inquire about symptoms for discriminating diseases. However, since the dialogue records used to build a patient simulator are collected passively, the data might be deteriorated by some task-unrelated biases, such as the preference of the collectors. These biases might hinder the diagnostic agent to capture transportable knowledge from the simulator. This work attempts to address these critical issues in MDA by taking advantage of the causal diagram to identify and resolve two representative non-causal biases, i.e., (i) default-answer bias and (ii) distributional inquiry bias. Specifically, Bias (i) originates from the patient simulator which tries to answer the unrecorded inquiries with some biased default answers. Consequently, the diagnostic agents cannot fully demonstrate their advantages due to the biased answers. To eliminate this bias and inspired by the propensity score matching technique with causal diagram, we propose a propensity-based patient simulator to effectively answer unrecorded inquiry by drawing knowledge from the other records; Bias (ii) inherently comes along with the passively collected data, and is one of the key obstacles for training the agent towards "learning how" rather than "remembering what". For example, within the distribution of training data, if a symptom is highly coupled with a certain disease, the agent might learn to only inquire about that symptom to discriminate that disease, thus might not generalize to the out-of-distribution cases. To this end, we propose a progressive assurance agent, which includes the dual processes accounting for symptom inquiry and disease diagnosis respectively. The inquiry process is driven by the diagnosis process in a top-down manner to inquire about symptoms for enhancing diagnostic confidence.
翻译:医疗诊断助手(MDA)旨在构建交互式诊断智能体,通过顺序询问症状来鉴别疾病。然而,由于用于构建患者模拟器的对话记录是被动收集的,数据可能因某些与任务无关的偏差(例如收集者的偏好)而受损。这些偏差可能阻碍诊断智能体从模拟器中捕获可迁移的知识。本文利用因果图识别并解决两种典型的非因果偏差,即(i)默认答案偏差和(ii)分布性询问偏差,试图应对MDA中的这些关键问题。具体而言,偏差(i)源于患者模拟器,该模拟器尝试用有偏的默认答案回答未记录的问题。因此,诊断智能体因有偏答案无法充分展现其优势。为消除此偏差,受因果图下倾向评分匹配技术的启发,我们提出了一种基于倾向性的患者模拟器,通过从其他记录中提取知识有效回答未记录的问题;偏差(ii)本质上伴随被动收集的数据而来,是训练智能体实现“学习如何”而非“记住什么”的关键障碍之一。例如,在训练数据分布中,若某症状与特定疾病高度相关,智能体可能只学会询问该症状来鉴别该疾病,从而无法泛化到分布外的情况。为此,我们提出了一种渐进式保障智能体,包含分别负责症状询问和疾病诊断的双重过程。询问过程以自上而下的方式由诊断过程驱动,通过询问症状来增强诊断置信度。