Background and aims Generalizability of AI colonoscopy algorithms is important for wider adoption in clinical practice. However, current techniques for evaluating performance on unseen data require expensive and time-intensive labels. Methods We use a "Masked Siamese Network" (MSN) to identify novel phenomena in unseen data and predict polyp detector performance. MSN is trained to predict masked out regions of polyp images, without any labels. We test MSN's ability to be trained on data only from Israel and detect unseen techniques, narrow-band imaging (NBI) and chromendoscoy (CE), on colonoscopes from Japan (354 videos, 128 hours). We also test MSN's ability to predict performance of Computer Aided Detection (CADe) of polyps on colonoscopies from both countries, even though MSN is not trained on data from Japan. Results MSN correctly identifies NBI and CE as less similar to Israel whitelight than Japan whitelight (bootstrapped z-test, |z| > 496, p < 10-8 for both) using the label-free Frechet distance. MSN detects NBI with 99% accuracy, predicts CE better than our heuristic (90% vs 79% accuracy) despite being trained only on whitelight, and is the only method that is robust to noisy labels. MSN predicts CADe polyp detector performance on in-domain Israel and out-of-domain Japan colonoscopies (r=0.79, 0.37 respectively). With few examples of Japan detector performance to train on, MSN prediction of Japan performance improves (r=0.56). Conclusion Our technique can identify distribution shifts in clinical data and can predict CADe detector performance on unseen data, without labels. Our self-supervised approach can aid in detecting when data in practice is different from training, such as between hospitals or data has meaningfully shifted from training. MSN has potential for application to medical image domains beyond colonoscopy.
翻译:背景与目的:AI结肠镜算法的泛化能力对其在临床实践中的广泛采用至关重要。然而,当前评估模型在未见数据上性能的技术需要昂贵且耗时的标注过程。方法:我们采用“掩蔽孪生网络”(MSN)来识别未见数据中的新现象并预测息肉检测器的性能。MSN通过预测息肉图像中的掩蔽区域进行训练,无需任何标注。我们测试了MSN仅在以色列数据上训练后,检测日本结肠镜(354段视频,128小时)中未见技术——窄带成像(NBI)和染色内镜(CE)的能力。同时,我们检验了MSN预测两国结肠镜中息肉计算机辅助检测(CADe)性能的表现,尽管MSN并未在日本数据上训练。结果:MSN通过无标注的弗雷歇距离正确识别出NBI和CE与以色列白光图像的相似度低于日本白光图像(自助法z检验,|z|>496,p<10⁻⁸)。MSN检测NBI的准确率高达99%,对CE的预测准确率优于我们的启发式方法(90% vs 79%),且仅以白光数据训练,是唯一对噪声标签具有鲁棒性的方法。MSN能预测CADe息肉检测器在域内以色列和域外日本结肠镜上的性能(r分别为0.79和0.37)。通过少量日本检测器性能样本的训练,MSN对日本数据的预测性能进一步提升(r=0.56)。结论:我们的技术能识别临床数据中的分布偏移,并在无标注条件下预测CADe检测器对未见数据的性能。这种自监督方法有助于检测实际数据与训练数据的差异(如不同医院间数据或数据发生显著偏移)。MSN有望应用于结肠镜以外的医学图像领域。