In healthcare, multimodal data is prevalent and requires to be comprehensively analyzed before diagnostic decisions, including medical images, clinical reports, etc. However, current large-scale artificial intelligence models predominantly focus on single-modal cognitive abilities and neglect the integration of multiple modalities. Therefore, we propose Stone Needle, a general multimodal large-scale model framework tailored explicitly for healthcare applications. Stone Needle serves as a comprehensive medical multimodal model foundation, integrating various modalities such as text, images, videos, and audio to surpass the limitations of single-modal systems. Through the framework components of intent analysis, medical foundation models, prompt manager, and medical language module, our architecture can perform multi-modal interaction in multiple rounds of dialogue. Our method is a general multimodal large-scale model framework, integrating diverse modalities and allowing us to tailor for specific tasks. The experimental results demonstrate the superior performance of our method compared to single-modal systems. The fusion of different modalities and the ability to process complex medical information in Stone Needle benefits accurate diagnosis, treatment recommendations, and patient care.
翻译:在医疗保健领域,多模态数据普遍存在,包括医学影像、临床报告等,在诊断决策前需进行综合分析。然而,当前大规模人工智能模型主要关注单模态认知能力,忽视了多模态信息的整合。为此,我们提出了石针(Stone Needle),一种专门针对医疗保健应用的通用多模态大规模模型框架。石针作为综合性医学多模态模型基础,整合文本、图像、视频和音频等多种模态,突破了单模态系统的局限。通过意图分析、医学基础模型、提示管理器以及医学语言模块等框架组件,我们的架构能够在多轮对话中执行多模态交互。该方法是一种通用多模态大规模模型框架,整合不同模态,并允许针对特定任务进行定制。实验结果表明,与单模态系统相比,我们的方法展现出更优性能。石针中不同模态的融合以及处理复杂医学信息的能力,有助于实现精准诊断、治疗建议及患者护理。