As advances in large language models (LLMs) and multimodal techniques continue to mature, the development of general-purpose multimodal large language models (MLLMs) has surged, offering significant applications in interpreting natural images. However, the field of pathology has largely remained untapped, particularly in gathering high-quality data and designing comprehensive model frameworks. To bridge the gap in pathology MLLMs, we present PathAsst, a multimodal generative foundation AI assistant to revolutionize diagnostic and predictive analytics in pathology. The development of PathAsst involves three pivotal steps: data acquisition, CLIP model adaptation, and the training of PathAsst's multimodal generative capabilities. Firstly, we collect over 207K high-quality pathology image-text pairs from authoritative sources. Leveraging the advanced power of ChatGPT, we generate over 180K instruction-following samples. Furthermore, we devise additional instruction-following data specifically tailored for invoking eight pathology-specific sub-models we prepared, allowing the PathAsst to effectively collaborate with these models, enhancing its diagnostic ability. Secondly, by leveraging the collected data, we construct PathCLIP, a pathology-dedicated CLIP, to enhance PathAsst's capabilities in interpreting pathology images. Finally, we integrate PathCLIP with the Vicuna-13b and utilize pathology-specific instruction-tuning data to enhance the multimodal generation capacity of PathAsst and bolster its synergistic interactions with sub-models. The experimental results of PathAsst show the potential of harnessing AI-powered generative foundation model to improve pathology diagnosis and treatment processes.
翻译:随着大语言模型(LLMs)和多模态技术的持续成熟,通用多模态大语言模型(MLLMs)的发展蓬勃兴起,在自然图像解读领域展现出重要应用价值。然而,病理学领域在很大程度上尚未被充分开发,特别是在高质量数据收集和综合性模型框架设计方面。为填补病理学MLLMs的空白,我们提出PathAsst——一种多模态生成式基础AI助手,旨在革新病理学的诊断与预测分析。PathAsst的开发涉及三个关键步骤:数据采集、CLIP模型适配及PathAsst多模态生成能力的训练。首先,我们从权威来源收集了超过207K个高质量病理图像-文本对;借助ChatGPT的强大能力,生成了超过180K个指令遵循样本;此外,我们还专门设计了针对调用八种病理子模型的额外指令遵循数据,使PathAsst能有效协同这些模型以增强其诊断能力。其次,利用收集的数据,我们构建了病理专用CLIP模型PathCLIP,以提升PathAsst在病理图像解读方面的能力。最后,我们将PathCLIP与Vicuna-13b整合,并利用病理专用指令调优数据增强PathAsst的多模态生成能力及其与子模型的协同作用。PathAsst的实验结果表明,利用AI驱动的生成式基础模型改善病理诊断与治疗流程具有巨大潜力。