PathAsst: A Generative Foundation AI Assistant Towards Artificial General Intelligence of Pathology

As advances in large language models (LLMs) and multimodal techniques continue to mature, the development of general-purpose multimodal large language models (MLLMs) has surged, offering significant applications in interpreting natural images. However, the field of pathology has largely remained untapped, particularly in gathering high-quality data and designing comprehensive model frameworks. To bridge the gap in pathology MLLMs, we present PathAsst, a multimodal generative foundation AI assistant to revolutionize diagnostic and predictive analytics in pathology. The development of PathAsst involves three pivotal steps: data acquisition, CLIP model adaptation, and the training of PathAsst's multimodal generative capabilities. Firstly, we collect over 207K high-quality pathology image-text pairs from authoritative sources. Leveraging the advanced power of ChatGPT, we generate over 180K instruction-following samples. Furthermore, we devise additional instruction-following data specifically tailored for invoking eight pathology-specific sub-models we prepared, allowing the PathAsst to effectively collaborate with these models, enhancing its diagnostic ability. Secondly, by leveraging the collected data, we construct PathCLIP, a pathology-dedicated CLIP, to enhance PathAsst's capabilities in interpreting pathology images. Finally, we integrate PathCLIP with the Vicuna-13b and utilize pathology-specific instruction-tuning data to enhance the multimodal generation capacity of PathAsst and bolster its synergistic interactions with sub-models. The experimental results of PathAsst show the potential of harnessing AI-powered generative foundation model to improve pathology diagnosis and treatment processes.

翻译：随着大语言模型（LLMs）和多模态技术的持续成熟，通用多模态大语言模型（MLLMs）的发展蓬勃兴起，在自然图像解读领域展现出重要应用价值。然而，病理学领域在很大程度上尚未被充分开发，特别是在高质量数据收集和综合性模型框架设计方面。为填补病理学MLLMs的空白，我们提出PathAsst——一种多模态生成式基础AI助手，旨在革新病理学的诊断与预测分析。PathAsst的开发涉及三个关键步骤：数据采集、CLIP模型适配及PathAsst多模态生成能力的训练。首先，我们从权威来源收集了超过207K个高质量病理图像-文本对；借助ChatGPT的强大能力，生成了超过180K个指令遵循样本；此外，我们还专门设计了针对调用八种病理子模型的额外指令遵循数据，使PathAsst能有效协同这些模型以增强其诊断能力。其次，利用收集的数据，我们构建了病理专用CLIP模型PathCLIP，以提升PathAsst在病理图像解读方面的能力。最后，我们将PathCLIP与Vicuna-13b整合，并利用病理专用指令调优数据增强PathAsst的多模态生成能力及其与子模型的协同作用。PathAsst的实验结果表明，利用AI驱动的生成式基础模型改善病理诊断与治疗流程具有巨大潜力。