The intersection of medical Visual Question Answering (Med-VQA) is a challenging research topic with advantages including patient engagement and clinical expert involvement for second opinions. However, existing Med-VQA methods based on joint embedding fail to explain whether their provided results are based on correct reasoning or coincidental answers, which undermines the credibility of VQA answers. In this paper, we investigate the construction of a more cohesive and stable Med-VQA structure. Motivated by causal effect, we propose a novel Triangular Reasoning VQA (Tri-VQA) framework, which constructs reverse causal questions from the perspective of "Why this answer?" to elucidate the source of the answer and stimulate more reasonable forward reasoning processes. We evaluate our method on the Endoscopic Ultrasound (EUS) multi-attribute annotated dataset from five centers, and test it on medical VQA datasets. Experimental results demonstrate the superiority of our approach over existing methods. Our codes and pre-trained models are available at https://anonymous.4open.science/r/Tri_VQA.
翻译:医学视觉问答(Med-VQA)的交叉研究是一个具有挑战性的课题,其优势包括提升患者参与度以及为临床专家提供第二意见。然而,现有的基于联合嵌入的Med-VQA方法无法解释其提供的结果是基于正确推理还是偶然答案,这削弱了VQA答案的可信度。本文研究如何构建一个更具内聚性和稳定性的Med-VQA结构。受因果效应启发,我们提出了一种新颖的三角推理VQA(Tri-VQA)框架,该框架从“为何是这个答案?”的视角构建反向因果问题,以阐明答案的来源并激发更合理的前向推理过程。我们在来自五个中心的内镜超声(EUS)多属性标注数据集上评估了我们的方法,并在医学VQA数据集上进行了测试。实验结果表明,我们的方法优于现有方法。我们的代码与预训练模型可在 https://anonymous.4open.science/r/Tri_VQA 获取。