The development of computer vision solutions for gigapixel images in digital pathology is hampered by significant computational limitations due to the large size of whole slide images. In particular, digitizing biopsies at high resolutions is a time-consuming process, which is necessary due to the worsening results from the decrease in image detail. To alleviate this issue, recent literature has proposed using knowledge distillation to enhance the model performance at reduced image resolutions. In particular, soft labels and features extracted at the highest magnification level are distilled into a model that takes lower-magnification images as input. However, this approach fails to transfer knowledge about the most discriminative image regions in the classification process, which may be lost when the resolution is decreased. In this work, we propose to distill this information by incorporating attention maps during training. In particular, our formulation leverages saliency maps of the target class via grad-CAMs, which guides the lower-resolution Student model to match the Teacher distribution by minimizing the l2 distance between them. Comprehensive experiments on prostate histology image grading demonstrate that the proposed approach substantially improves the model performance across different image resolutions compared to previous literature.
翻译:数字病理学中千兆像素图像的计算机视觉解决方案开发,因全切片图像尺寸庞大而受到显著计算限制的阻碍。特别是在高分辨率下数字化活检组织是一个耗时过程,而这一过程因图像细节减少导致结果恶化而不可或缺。为缓解该问题,近期文献提出利用知识蒸馏增强模型在低分辨率图像上的性能。具体而言,将最高放大倍数下提取的软标签与特征蒸馏至以低放大倍数图像为输入的模型中。然而,该方法未能传递分类过程中最具判别性图像区域的知识,而这些区域可能因分辨率降低而丢失。本研究提出通过训练中引入注意力图来蒸馏此类信息。具体地,我们的方案通过grad-CAM利用目标类别的显著性图,指导低分辨率学生模型通过最小化两者间的L2距离来匹配教师分布。在前列腺组织学图像分级任务上的综合实验表明,与先前文献相比,本方法在不同图像分辨率下均显著提升了模型性能。