Screening Papanicolaou test samples effectively reduces cervical cancer-related mortality, but the lack of trained cytopathologists prevents its widespread adoption in low-resource settings. Developing AI algorithms, e.g., deep learning to analyze the digitized cytology images suited to resource-constrained countries is appealing. Albeit successful, it comes at the price of collecting large annotated training datasets, which is both costly and time-consuming. Our study shows that the large number of unlabeled images that can be sampled from digitized cytology slides make for a ripe ground where self-supervised learning methods can thrive and even outperform off-the-shelf deep learning models on various downstream tasks. Along the same line, we report improved performance and data efficiency using modern augmentation strategies.
翻译:筛查巴氏涂片样本可有效降低宫颈癌相关死亡率,但训练有素的细胞病理学家的匮乏阻碍了其在低资源环境中的广泛应用。开发适用于资源受限国家的人工智能算法(如深度学习分析数字化细胞学图像)颇具吸引力。尽管此类方法已取得显著成果,但其代价是需要收集大规模标注训练数据集,这一过程既昂贵又耗时。本研究表明,从数字化细胞学切片中可获取的大量无标注图像为自监督学习方法提供了成熟的应用场景,这类方法甚至能在各类下游任务中超越现成的深度学习模型。基于同一思路,我们通过采用现代数据增强策略,进一步提升了模型性能与数据利用效率。