Pathological captioning of Whole Slide Images (WSIs), though is essential in computer-aided pathological diagnosis, has rarely been studied due to the limitations in datasets and model training efficacy. In this paper, we propose a new paradigm Subtype-guided Masked Transformer (SGMT) for pathological captioning based on Transformers, which treats a WSI as a sequence of sparse patches and generates an overall caption sentence from the sequence. An accompanying subtype prediction is introduced into SGMT to guide the training process and enhance the captioning accuracy. We also present an Asymmetric Masked Mechansim approach to tackle the large size constraint of pathological image captioning, where the numbers of sequencing patches in SGMT are sampled differently in the training and inferring phases, respectively. Experiments on the PatchGastricADC22 dataset demonstrate that our approach effectively adapts to the task with a transformer-based model and achieves superior performance than traditional RNN-based methods. Our codes are to be made available for further research and development.
翻译:全切片图像的病理描述虽然在计算机辅助病理诊断中至关重要,但由于数据集和模型训练效率的限制,相关研究尚属罕见。本文提出一种基于Transformer的新型病理描述范式——亚型引导掩码Transformer(SGMT),该模型将全切片图像视为稀疏补丁序列,并基于该序列生成整体描述语句。我们引入伴随亚型预测机制来指导训练过程并提升描述准确性。同时提出非对称掩码方法以应对病理图像描述的大尺寸约束,该方法在训练阶段和推理阶段分别对SGMT中的序列补丁数量采用不同采样策略。在PatchGastricADC22数据集上的实验表明,本方法能够有效适配基于Transformer的模型任务,并取得优于传统RNN方法的性能。相关代码将开源以供进一步研究开发。