Ulcerative colitis (UC) is a chronic mucosal inflammatory condition that places patients at increased risk of colorectal cancer. Colonoscopic surveillance remains the gold standard for assessing disease activity, and reporting typically relies on standardised endoscopic scoring metrics. The most widely used is the Mayo Endoscopic Score (MES), with some centres also adopting the Ulcerative Colitis Endoscopic Index of Severity (UCEIS). Both are descriptive assessments of mucosal inflammation (MES: 0 to 3; UCEIS: 0 to 8), where higher values indicate more severe disease. However, computational methods for automatically predicting these scores remain limited, largely due to the lack of publicly available expert-annotated datasets and the absence of robust benchmarking. There is also a significant research gap in generating clinically meaningful descriptions of UC images, despite image captioning being a well-established computer vision task. Variability in endoscopic systems and procedural workflows across centres further highlights the need for multi-centre datasets to ensure algorithmic robustness and generalisability. In this work, we introduce a curated multi-centre, multi-resolution dataset that includes expert-validated MES and UCEIS labels, alongside detailed clinical descriptions. To our knowledge, this is the first comprehensive dataset that combines dual scoring metrics for classification tasks with expert-generated captions describing mucosal appearance and clinically accepted reasoning for image captioning. This resource opens new opportunities for developing clinically meaningful multimodal algorithms. In addition to the dataset, we also provide benchmarking using convolutional neural networks, vision transformers, hybrid models, and widely used multimodal vision-language captioning algorithms.
翻译:溃疡性结肠炎(UC)是一种慢性黏膜炎症性疾病,会增加患者罹患结直肠癌的风险。结肠镜监测仍是评估疾病活动度的金标准,其报告通常依赖于标准化的内窥镜评分指标。应用最广泛的是梅奥内镜评分(MES),部分中心也采用溃疡性结肠炎内镜下严重程度指数(UCEIS)。两者均为对黏膜炎症的描述性评估(MES:0至3分;UCEIS:0至8分),分值越高表明疾病越严重。然而,自动预测这些评分的计算方法仍然有限,这主要由于缺乏公开可用的专家标注数据集以及缺少稳健的基准测试。尽管图像描述生成是计算机视觉领域一项成熟的任务,但在生成具有临床意义的UC图像描述方面仍存在显著的研究空白。不同中心内窥镜系统与操作流程的差异性进一步凸显了对多中心数据集的需求,以确保算法的稳健性与泛化能力。本研究引入了一个精心构建的多中心、多分辨率数据集,包含经专家验证的MES和UCEIS标签以及详细的临床描述。据我们所知,这是首个将用于分类任务的双重评分指标与专家生成的、描述黏膜外观及临床公认推理的图像描述相结合的综合性数据集。该资源为开发具有临床意义的多模态算法提供了新的机遇。除数据集外,我们还提供了基于卷积神经网络、视觉Transformer、混合模型以及广泛使用的多模态视觉-语言描述生成算法的基准测试结果。