Image-based cell profiling aims to create informative representations of cell images. This technique is critical in drug discovery and has greatly advanced with recent improvements in computer vision. Inspired by recent developments in non-contrastive Self-Supervised Learning (SSL), this paper provides an initial exploration into training a generalizable feature extractor for cell images using such methods. However, there are two major challenges: 1) Unlike typical scenarios where each representation is based on a single image, cell profiling often involves multiple input images, making it difficult to effectively fuse all available information; and 2) There is a large difference between the distributions of cell images and natural images, causing the view-generation process in existing SSL methods to fail. To address these issues, we propose a self-supervised framework with local aggregation to improve cross-site consistency of cell representations. We introduce specialized data augmentation and representation post-processing methods tailored to cell images, which effectively address the issues mentioned above and result in a robust feature extractor. With these improvements, the proposed framework won the Cell Line Transferability challenge at CVPR 2025.
翻译:图像细胞特征分析旨在为细胞图像创建信息丰富的表示。该技术在药物发现中至关重要,并随着计算机视觉的最新进展而取得了巨大进步。受非对比自监督学习近期发展的启发,本文首次探索了使用此类方法训练适用于细胞图像的通用特征提取器。然而,存在两大挑战:1)与通常基于单张图像生成表示的典型场景不同,细胞特征分析常涉及多张输入图像,难以有效融合所有可用信息;2)细胞图像与自然图像的分布存在巨大差异,导致现有自监督学习方法中的视图生成过程失效。为解决这些问题,我们提出了一种带有局部聚合的自监督框架,以提升细胞表征的跨站点一致性。我们引入了专为细胞图像设计的定制化数据增强和表征后处理方法,有效解决了上述问题,从而构建出鲁棒的特征提取器。通过这些改进,所提出的框架在CVPR 2025的细胞系可迁移性挑战赛中获胜。