Benchmarking PathCLIP for Pathology Image Analysis

Accurate image classification and retrieval are of importance for clinical diagnosis and treatment decision-making. The recent contrastive language-image pretraining (CLIP) model has shown remarkable proficiency in understanding natural images. Drawing inspiration from CLIP, PathCLIP is specifically designed for pathology image analysis, utilizing over 200,000 image and text pairs in training. While the performance the PathCLIP is impressive, its robustness under a wide range of image corruptions remains unknown. Therefore, we conduct an extensive evaluation to analyze the performance of PathCLIP on various corrupted images from the datasets of Osteosarcoma and WSSS4LUAD. In our experiments, we introduce seven corruption types including brightness, contrast, Gaussian blur, resolution, saturation, hue, and markup at four severity levels. Through experiments, we find that PathCLIP is relatively robustness to image corruptions and surpasses OpenAI-CLIP and PLIP in zero-shot classification. Among the seven corruptions, blur and resolution can cause server performance degradation of the PathCLIP. This indicates that ensuring the quality of images is crucial before conducting a clinical test. Additionally, we assess the robustness of PathCLIP in the task of image-image retrieval, revealing that PathCLIP performs less effectively than PLIP on Osteosarcoma but performs better on WSSS4LUAD under diverse corruptions. Overall, PathCLIP presents impressive zero-shot classification and retrieval performance for pathology images, but appropriate care needs to be taken when using it. We hope this study provides a qualitative impression of PathCLIP and helps understand its differences from other CLIP models.

翻译：准确的图像分类与检索对于临床诊断和治疗决策至关重要。最新对比语言-图像预训练（CLIP）模型在理解自然图像方面展现出卓越能力。受CLIP启发，PathCLIP针对病理图像分析专门设计，利用超过20万组图像-文本对进行训练。尽管PathCLIP性能令人印象深刻，但其在多种图像失真下的鲁棒性仍未知。因此，我们对来自骨肉瘤和WSSS4LUAD数据集的各类失真图像开展了全面评估。实验中，我们引入七种失真类型（亮度、对比度、高斯模糊、分辨率、饱和度、色调和标注），每种设置四个严重等级。实验发现，PathCLIP对图像失真具有相对鲁棒性，在零样本分类任务中超越OpenAI-CLIP和PLIP。在七种失真中，模糊和分辨率下降会导致PathCLIP性能严重退化，表明临床检测前确保图像质量至关重要。此外，我们评估了PathCLIP在图像-图像检索任务中的鲁棒性，发现其在骨肉瘤数据集上的表现不及PLIP，但在WSSS4LUAD数据集上，面对多种失真时表现更优。总体而言，PathCLIP在病理图像的零样本分类和检索任务中展现出令人瞩目的性能，但使用时仍需适当注意。我们希望本研究能提供对PathCLIP的定性认知，并帮助理解其与其他CLIP模型的差异。