$\textbf{Background}$: Generalizability of AI colonoscopy algorithms is important for wider adoption in clinical practice. However, current techniques for evaluating performance on unseen data require expensive and time-intensive labels. $\textbf{Methods}$: We use a "Masked Siamese Network" (MSN) to identify novel phenomena in unseen data and predict polyp detector performance. MSN is trained to predict masked out regions of polyp images, without any labels. We test MSN's ability to be trained on data only from Israel and detect unseen techniques, narrow-band imaging (NBI) and chromendoscoy (CE), on colonoscopes from Japan (354 videos, 128 hours). We also test MSN's ability to predict performance of Computer Aided Detection (CADe) of polyps on colonoscopies from both countries, even though MSN is not trained on data from Japan. $\textbf{Results}$: MSN correctly identifies NBI and CE as less similar to Israel whitelight than Japan whitelight (bootstrapped z-test, |z| > 496, p < 10^-8 for both) using the label-free Frechet distance. MSN detects NBI with 99% accuracy, predicts CE better than our heuristic (90% vs 79% accuracy) despite being trained only on whitelight, and is the only method that is robust to noisy labels. MSN predicts CADe polyp detector performance on in-domain Israel and out-of-domain Japan colonoscopies (r=0.79, 0.37 respectively). With few examples of Japan detector performance to train on, MSN prediction of Japan performance improves (r=0.56). $\textbf{Conclusion}$: Our technique can identify distribution shifts in clinical data and can predict CADe detector performance on unseen data, without labels. Our self-supervised approach can aid in detecting when data in practice is different from training, such as between hospitals or data has meaningfully shifted from training. MSN has potential for application to medical image domains beyond colonoscopy.
翻译:$\textbf{背景}$:AI结肠镜算法的泛化能力对于其在临床实践中的广泛采用至关重要。然而,当前评估模型在未见数据上表现的技术需要昂贵且耗时的标注。$\textbf{方法}$:我们采用"掩码孪生网络"(MSN)来识别未见数据中的新异现象,并预测息肉检测器的性能。MSN在无任何标注的条件下训练,以预测息肉图像中被掩码的区域。我们测试了MSN仅使用以色列数据训练后,检测日本结肠镜(354段视频,128小时)中未见技术——窄带成像(NBI)和色素内镜(CE)的能力。同时,我们还评估了MSN预测两国结肠镜检查中计算机辅助检测(CADe)息肉性能的能力,尽管MSN未接受日本数据训练。$\textbf{结果}$:MSN通过无标签的弗雷歇距离正确识别出NBI和CE与以色列白光图像的相似度低于日本白光图像(自举z检验,两者|z|>496,p<10^{-8})。MSN检测NBI的准确率达99%,对CE的预测优于启发式方法(90% vs 79%),尽管仅使用白光图像训练,且是唯一对噪声标注具有鲁棒性的方法。MSN可预测CADe息肉检测器在域内以色列和域外日本结肠镜检查中的表现(相关系数r分别为0.79和0.37)。在少量日本检测器性能样本训练条件下,MSN对日本表现的预测改善(r=0.56)。$\textbf{结论}$:我们的技术无需标注即可识别临床数据中的分布偏移,并预测CADe检测器在未见数据上的表现。这种自监督方法有助于检测实践中数据与训练的差异(如跨医院差异)或数据发生有意义的漂移。MSN有望应用于结肠镜之外的医学图像领域。