A Framework for Multimodal Medical Image Interaction

Medical doctors rely on images of the human anatomy, such as magnetic resonance imaging (MRI), to localize regions of interest in the patient during diagnosis and treatment. Despite advances in medical imaging technology, the information conveyance remains unimodal. This visual representation fails to capture the complexity of the real, multisensory interaction with human tissue. However, perceiving multimodal information about the patient's anatomy and disease in real-time is critical for the success of medical procedures and patient outcome. We introduce a Multimodal Medical Image Interaction (MMII) framework to allow medical experts a dynamic, audiovisual interaction with human tissue in three-dimensional space. In a virtual reality environment, the user receives physically informed audiovisual feedback to improve the spatial perception of anatomical structures. MMII uses a model-based sonification approach to generate sounds derived from the geometry and physical properties of tissue, thereby eliminating the need for hand-crafted sound design. Two user studies involving 34 general and nine clinical experts were conducted to evaluate the proposed interaction framework's learnability, usability, and accuracy. Our results showed excellent learnability of audiovisual correspondence as the rate of correct associations significantly improved (p < 0.001) over the course of the study. MMII resulted in superior brain tumor localization accuracy (p < 0.05) compared to conventional medical image interaction. Our findings substantiate the potential of this novel framework to enhance interaction with medical images, for example, during surgical procedures where immediate and precise feedback is needed.

翻译：医生在诊断和治疗过程中依赖人体解剖图像（如磁共振成像MRI）来定位患者体内的感兴趣区域。尽管医学成像技术取得了进步，但信息传递方式仍为单模态。这种视觉呈现方式未能捕捉到与人体组织真实、多感官交互的复杂性。然而，实时感知患者解剖结构和疾病的多模态信息对于医疗程序的成功和患者预后至关重要。我们提出了一种多模态医学图像交互（MMII）框架，使医学专家能够在三维空间中与人体组织进行动态的视听交互。在虚拟现实环境中，用户接收基于物理信息的视听反馈，以提升对解剖结构的空间感知能力。MMII采用基于模型的声化方法，根据组织的几何与物理特性生成声音，从而无需人工设计声音。我们开展了两项用户研究（涉及34名普通专家和9名临床专家），以评估所提出交互框架的可学习性、可用性和准确性。研究结果显示，视听关联的正确率在研究过程中显著提升（p < 0.001），表明其对应关系具有极佳的可学习性。与传统医学图像交互方式相比，MMII在脑肿瘤定位准确性方面表现更优（p < 0.05）。我们的研究结果证实了这一新型框架在增强医学图像交互方面的潜力，尤其适用于需要即时精准反馈的外科手术等场景。