PathReasoner-R1: Instilling Structured Reasoning into Pathology Vision-Language Model via Knowledge-Guided Policy Optimization

Vision-Language Models (VLMs) are advancing computational pathology with superior visual understanding capabilities. However, current systems often reduce diagnosis to directly output conclusions without verifiable evidence-linked reasoning, which severely limits clinical trust and hinders expert error rectification. To address these barriers, we construct PathReasoner, the first large-scale dataset of whole-slide image (WSI) reasoning. Unlike previous work reliant on unverified distillation, we develop a rigorous knowledge-guided generation pipeline. By leveraging medical knowledge graphs, we explicitly align structured pathological findings and clinical reasoning with diagnoses, generating over 20K high-quality instructional samples. Based on the database, we propose PathReasoner-R1, which synergizes trajectory-masked supervised fine-tuning with reasoning-oriented reinforcement learning to instill structured chain-of-thought capabilities. To ensure medical rigor, we engineer a knowledge-aware multi-granular reward function incorporating an Entity Reward mechanism strictly aligned with knowledge graphs. This effectively guides the model to optimize for logical consistency rather than mere outcome matching, thereby enhancing robustness. Extensive experiments demonstrate that PathReasoner-R1 achieves state-of-the-art performance on both PathReasoner and public benchmarks across various image scales, equipping pathology models with transparent, clinically grounded reasoning capabilities. Dataset and code are available at https://github.com/cyclexfy/PathReasoner-R1.

翻译：视觉语言模型（VLM）凭借其卓越的视觉理解能力，正在推动计算病理学的发展。然而，当前系统通常将诊断简化为直接输出结论，缺乏可验证的、与证据相关联的推理过程，这严重限制了临床信任度并阻碍了专家纠错。为克服这些障碍，我们构建了PathReasoner，这是首个大规模的全切片图像（WSI）推理数据集。不同于以往依赖未经验证的知识蒸馏的工作，我们开发了一个严格的知识引导生成流程。通过利用医学知识图谱，我们显式地将结构化的病理学发现和临床推理与诊断对齐，生成了超过20,000个高质量的教学样本。基于此数据库，我们提出了PathReasoner-R1，它协同结合了轨迹掩码监督微调与面向推理的强化学习，以注入结构化的思维链能力。为确保医学严谨性，我们设计了一个知识感知的多粒度奖励函数，其中包含与知识图谱严格对齐的实体奖励机制。这有效地引导模型优化逻辑一致性，而非仅仅匹配结果，从而增强了模型的鲁棒性。大量实验表明，PathReasoner-R1在PathReasoner数据集以及多个图像尺度的公开基准测试中均达到了最先进的性能，为病理学模型赋予了透明且基于临床的推理能力。数据集和代码可在 https://github.com/cyclexfy/PathReasoner-R1 获取。