A Foundational Multimodal Vision Language AI Assistant for Human Pathology

The field of computational pathology has witnessed remarkable progress in the development of both task-specific predictive models and task-agnostic self-supervised vision encoders. However, despite the explosive growth of generative artificial intelligence (AI), there has been limited study on building general purpose, multimodal AI assistants tailored to pathology. Here we present PathChat, a vision-language generalist AI assistant for human pathology using an in-house developed foundational vision encoder pretrained on 100 million histology images from over 100,000 patient cases and 1.18 million pathology image-caption pairs. The vision encoder is then combined with a pretrained large language model and the whole system is finetuned on over 250,000 diverse disease agnostic visual language instructions. We compare PathChat against several multimodal vision language AI assistants as well as GPT4V, which powers the commercially available multimodal general purpose AI assistant ChatGPT-4. When relevant clinical context is provided with the histology image, PathChat achieved a diagnostic accuracy of 87% on multiple-choice questions based on publicly available cases of diverse tissue origins and disease models. Additionally, using open-ended questions and human expert evaluation, we found that overall PathChat produced more accurate and pathologist-preferable responses to diverse queries related to pathology. As an interactive and general vision language AI assistant that can flexibly handle both visual and natural language inputs, PathChat can potentially find impactful applications in pathology education, research, and human-in-the-loop clinical decision making.

翻译：计算病理学领域在开发任务特异性预测模型与任务无关的自监督视觉编码器方面取得了显著进展。然而，尽管生成式人工智能（AI）呈爆发式增长，针对病理学领域构建通用多模态AI助手的研究仍十分有限。本文提出PathChat——一种面向人类病理学的视觉-语言通用型AI助手，该系统采用内部开发的基础视觉编码器，该编码器基于来自10万以上病例的1亿张组织学图像与118万对病理图像-标题对进行预训练。视觉编码器随后与预训练大语言模型相结合，整个系统在超过25万条多样化疾病无关的视觉语言指令上进行微调。我们将PathChat与多个多模态视觉语言AI助手以及GPT4V（驱动商用多模态通用AI助手ChatGPT-4的模型）进行对比。当结合组织学图像提供相关临床背景时，PathChat在基于公开病例（涵盖多种组织起源与疾病模型）的多项选择题中实现87%的诊断准确率。此外，通过开放式问题与人类专家评估，我们发现PathChat对病理学相关多样化查询能产生更准确且更受病理学家青睐的回应。作为一款可灵活处理视觉与自然语言输入的交互式通用视觉语言AI助手，PathChat有望在病理学教育、研究及人机协同临床决策领域产生重要应用价值。