The development of artificial intelligence systems for colonoscopy analysis often necessitates expert-annotated image datasets. However, limitations in dataset size and diversity impede model performance and generalisation. Image-text colonoscopy records from routine clinical practice, comprising millions of images and text reports, serve as a valuable data source, though annotating them is labour-intensive. Here we leverage recent advancements in large language and vision models and propose EndoKED, a data mining paradigm for deep knowledge extraction and distillation. EndoKED automates the transformation of raw colonoscopy records into image datasets with pixel-level annotation. We validate EndoKED using multi-centre datasets of raw colonoscopy records (~1 million images), demonstrating its superior performance in training polyp detection and segmentation models. Furthermore, the EndoKED pre-trained vision backbone enables data-efficient and generalisable learning for optical biopsy, achieving expert-level performance in both retrospective and prospective validation.
翻译:用于结肠镜分析的人工智能系统开发常需专家标注的图像数据集。然而,数据集规模和多样性的局限性制约了模型性能与泛化能力。常规临床实践中产生的图文结肠镜记录包含数百万张图像及文本报告,虽为宝贵数据源,但标注过程劳动密集。本文借助大型语言与视觉模型的最新进展,提出了一种名为EndoKED的数据挖掘范式,用于深层知识提取与精炼。EndoKED可自动化地将原始结肠镜记录转换为具有像素级标注的图像数据集。我们利用多中心原始结肠镜记录数据集(约100万张图像)验证了EndoKED,证明其在训练息肉检测与分割模型方面具有优越性能。此外,EndoKED预训练的视觉骨干网络可实现数据高效且可泛化的光学活检学习,在回顾性与前瞻性验证中均达到专家级性能。