In the healthcare domain, summarizing medical questions posed by patients is critical for improving doctor-patient interactions and medical decision-making. Although medical data has grown in complexity and quantity, the current body of research in this domain has primarily concentrated on text-based methods, overlooking the integration of visual cues. Also prior works in the area of medical question summarisation have been limited to the English language. This work introduces the task of multimodal medical question summarization for codemixed input in a low-resource setting. To address this gap, we introduce the Multimodal Medical Codemixed Question Summarization MMCQS dataset, which combines Hindi-English codemixed medical queries with visual aids. This integration enriches the representation of a patient's medical condition, providing a more comprehensive perspective. We also propose a framework named MedSumm that leverages the power of LLMs and VLMs for this task. By utilizing our MMCQS dataset, we demonstrate the value of integrating visual information from images to improve the creation of medically detailed summaries. This multimodal strategy not only improves healthcare decision-making but also promotes a deeper comprehension of patient queries, paving the way for future exploration in personalized and responsive medical care. Our dataset, code, and pre-trained models will be made publicly available.
翻译:在医疗健康领域,总结患者提出的医学问题对于改善医患互动及医学决策至关重要。尽管医疗数据在复杂性和数量上持续增长,当前该领域的研究主要集中于基于文本的方法,忽视了视觉线索的整合。此外,先前在医学问题摘要领域的研究仅限于英语。本研究提出了低资源环境下针对混合语言输入的多模态医学问题摘要任务。为填补这一空白,我们引入了多模态医学混合语言问题摘要(MMCQS)数据集,该数据集将印地语-英语混合医学查询与视觉辅助信息相结合。这种整合丰富了患者病情的表征,提供了更全面的视角。我们同时提出名为MedSumm的框架,利用大语言模型(LLM)和视觉语言模型(VLM)完成该任务。通过使用MMCQS数据集,我们证明了整合图像视觉信息可提升医学术语精细化摘要的生成质量。这种多模态策略不仅改善了医疗决策,还促进了对患者问题的深层理解,为个性化与响应式医疗护理的未来探索奠定了基础。我们的数据集、代码及预训练模型将公开发布。