LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery

Visual question answering (VQA) is crucial for promoting surgical education. In practice, the needs of trainees are constantly evolving, such as learning more surgical types, adapting to different robots, and learning new surgical instruments and techniques for various surgeries. However, patient data privacy often restricts the availability of old data when updating the model, necessitating an exemplar-free continual learning (CL) setup. Prior CL studies overlooked two vital problems in the surgical domain: 1) large domain shifts from diverse surgical operations collected from multiple sources, and 2) severe data imbalance arising from the uneven presence of surgical instruments or activities. This paper proposes addressing these problems with a multimodal large language model (LLM) and an adaptive weight assignment methodology. We first develop a new multi-teacher CL framework that leverages a multimodal LLM as the additional teacher. The strong generalization ability of the LLM can bridge the knowledge gap when domain shifts and data imbalances occur. We then put forth a novel data processing method that transforms complex LLM embeddings into logits compatible with our CL framework. We further design an adaptive weight assignment approach that balances the generalization ability of the LLM and the domain expertise of the old CL model. Finally, to comprehensively test the effectiveness of our proposed method, we have also constructed two new surgical VQA datasets that are largely different from existing ones and could be valuable resources for future research. Extensive experimental results on the tested datasets demonstrate the superiority of our method to other advanced CL schemes.

翻译：视觉问答（VQA）对于推动外科教育至关重要。在实践中，学员的需求不断演变，例如学习更多手术类型、适应不同机器人系统，以及掌握针对各类手术的新型手术器械与技术。然而，患者数据隐私常限制模型更新时旧数据的可用性，因此需要采用无样本的持续学习（CL）设置。先前持续学习研究忽视了手术领域两个关键问题：1）从多来源采集的多样化手术操作导致的显著域偏移，2）手术器械或活动分布不均引发的严重数据不平衡。本文提出通过多模态大语言模型（LLM）与自适应权重分配方法解决这些问题。我们首先开发了一种新型多教师持续学习框架，利用多模态LLM作为附加教师。LLM强大的泛化能力能够在域偏移与数据不平衡发生时弥合知识鸿沟。随后提出一种新颖的数据处理方法，将复杂的LLM嵌入向量转化为兼容我们持续学习框架的对数概率。我们进一步设计了自适应权重分配方案，以平衡LLM的泛化能力与旧持续学习模型的领域专业知识。最后，为全面验证所提方法的有效性，我们还构建了两个与现有数据集差异显著的新型手术VQA数据集，这些数据集可能成为未来研究的宝贵资源。在测试数据集上的大量实验结果表明，我们的方法优于其他先进的持续学习方案。