Improved Segmentation of Polyps and Visual Explainability Analysis

Colorectal cancer (CRC) remains one of the leading causes of cancer-related morbidity and mortality worldwide, with gastrointestinal (GI) polyps serving as critical precursors according to the World Health Organization (WHO). Early and accurate segmentation of polyps during colonoscopy is essential for reducing CRC progression, yet manual delineation is labor-intensive and prone to observer variability. Deep learning methods have demonstrated strong potential for automated polyp analysis, but their limited interpretability remains a barrier to clinical adoption. In this study, we present PolypSeg-GradCAM, an explainable deep learning framework that integrates a U-Net architecture with a pre-trained ResNet-34 backbone and Gradient-weighted Class Activation Mapping (Grad-CAM) for transparent polyp segmentation. To ensure rigorous benchmarking, the model was trained and evaluated using 5-Fold Cross-Validation on the Kvasir-SEG dataset of 1,000 annotated endoscopic images. Experimental results show a mean Dice coefficient of 0.8902 +/- 0.0125, a mean Intersection-over-Union (IoU) of 0.8023, and an Area Under the Receiver Operating Characteristic Curve (AUC-ROC) of 0.9722. Advanced quantitative analysis using an optimal threshold yielded a Sensitivity of 0.9058 and Precision of 0.9083. Additionally, Grad-CAM visualizations confirmed that predictions were guided by clinically relevant regions, offering insight into the model's decision-making process. This study demonstrates that integrating segmentation accuracy with interpretability can support the development of trustworthy AI-assisted colonoscopy tools.

翻译：结直肠癌（CRC）仍是全球癌症相关发病率和死亡率的主要原因之一，根据世界卫生组织（WHO）的分类，胃肠道（GI）息肉是其关键前体病变。在结肠镜检查中实现息肉早期且准确的分割对于降低CRC进展至关重要，但人工勾画耗时费力且易受观察者间差异影响。深度学习方法在自动化息肉分析中展现出强大潜力，但其有限的可解释性仍是临床应用的障碍。本研究提出PolypSeg-GradCAM——一种可解释的深度学习框架，该框架将U-Net架构与预训练的ResNet-34主干网络及梯度加权类激活映射（Grad-CAM）相结合，以实现透明的息肉分割。为确保严谨的基准测试，模型在包含1000张标注内镜图像的Kvasir-SEG数据集上采用五折交叉验证进行训练与评估。实验结果显示：平均Dice系数为0.8902 ± 0.0125，平均交并比（IoU）为0.8023，受试者工作特征曲线下面积（AUC-ROC）为0.9722。基于最优阈值的进阶定量分析获得灵敏度0.9058与精确率0.9083。此外，Grad-CAM可视化结果证实预测过程受临床相关区域引导，为理解模型决策机制提供了依据。本研究表明，将分割准确性与可解释性相结合，可推动可信赖的AI辅助结肠镜工具的开发。