Large language models (LLMs) are becoming increasingly relevant as a potential tool for healthcare, aiding communication between clinicians, researchers, and patients. However, traditional evaluations of LLMs on medical exam questions do not reflect the complexity of real patient-doctor interactions. An example of this complexity is the introduction of patient self-diagnosis, where a patient attempts to diagnose their own medical conditions from various sources. While the patient sometimes arrives at an accurate conclusion, they more often are led toward misdiagnosis due to the patient's over-emphasis on bias validating information. In this work we present a variety of LLMs with multiple-choice questions from United States medical board exams which are modified to include self-diagnostic reports from patients. Our findings highlight that when a patient proposes incorrect bias-validating information, the diagnostic accuracy of LLMs drop dramatically, revealing a high susceptibility to errors in self-diagnosis.
翻译:大型语言模型(LLMs)作为医疗保健领域的潜在工具正日益受到关注,有助于促进临床医生、研究人员和患者之间的沟通。然而,传统上基于医学考试题目对LLMs的评估并未反映真实医患互动的复杂性。这种复杂性的一个例子是患者自我诊断的引入,即患者试图通过各种来源自行诊断其医疗状况。虽然患者有时能得出准确结论,但更常见的是,由于患者过度强调验证偏倚的信息,他们往往会被导向误诊。在本研究中,我们向多种LLMs展示了来自美国医学委员会考试的改良多选题,这些题目融入了患者的自我诊断报告。我们的发现凸显,当患者提出错误的验证偏倚信息时,LLMs的诊断准确率急剧下降,揭示了它们对自我诊断错误的高度敏感性。