Computational pathology demands both visual pattern recognition and dynamic integration of structured domain knowledge, including taxonomy, grading criteria, and clinical evidence. In practice, diagnostic reasoning requires linking morphological evidence with formal diagnostic and grading criteria. Although multimodal large language models (MLLMs) demonstrate strong vision language reasoning capabilities, they lack explicit mechanisms for structured knowledge integration and interpretable memory control. As a result, existing models struggle to consistently incorporate pathology-specific diagnostic standards during reasoning. Inspired by the hierarchical memory process of human pathologists, we propose PathMem, a memory-centric multimodal framework for pathology MLLMs. PathMem organizes structured pathology knowledge as a long-term memory (LTM) and introduces a Memory Transformer that models the dynamic transition from LTM to working memory (WM) through multimodal memory activation and context-aware knowledge grounding, enabling context-aware memory refinement for downstream reasoning. PathMem achieves SOTA performance across benchmarks, improving WSI-Bench report generation (12.8% WSI-Precision, 10.1% WSI-Relevance) and open-ended diagnosis by 9.7% and 8.9% over prior WSI-based models.
翻译:计算病理学既需要视觉模式识别,又要求动态整合结构化领域知识,包括分类学、分级标准和临床证据。在实践中,诊断推理需要将形态学证据与正式诊断及分级标准相关联。尽管多模态大语言模型(MLLMs)展现出强大的视觉语言推理能力,但它们缺乏结构化知识整合与可解释记忆控制的显式机制。因此,现有模型在推理过程中难以持续融入病理学特有的诊断标准。受人类病理学家分层记忆过程的启发,我们提出了PathMem——一个面向病理学MLLMs的以记忆为中心的多模态框架。PathMem将结构化病理学知识组织为长期记忆(LTM),并引入一种记忆Transformer,通过多模态记忆激活和上下文感知的知识锚定,建模从LTM到工作记忆(WM)的动态转换过程,从而为下游推理实现上下文感知的记忆精炼。PathMem在多项基准测试中取得了最先进的性能,相较于先前基于WSI的模型,在WSI-Bench报告生成任务上(WSI-Precision提升12.8%,WSI-Relevance提升10.1%)以及开放式诊断任务上分别提升了9.7%和8.9%。