High content imaging assays can capture rich phenotypic response data for large sets of compound treatments, aiding in the characterization and discovery of novel drugs. However, extracting representative features from high content images that can capture subtle nuances in phenotypes remains challenging. The lack of high-quality labels makes it difficult to achieve satisfactory results with supervised deep learning. Self-Supervised learning methods have shown great success on natural images, and offer an attractive alternative also to microscopy images. However, we find that self-supervised learning techniques underperform on high content imaging assays. One challenge is the undesirable domain shifts present in the data known as batch effects, which are caused by biological noise or uncontrolled experimental conditions. To this end, we introduce Cross-Domain Consistency Learning (CDCL), a self-supervised approach that is able to learn in the presence of batch effects. CDCL enforces the learning of biological similarities while disregarding undesirable batch-specific signals, leading to more useful and versatile representations. These features are organised according to their morphological changes and are more useful for downstream tasks -- such as distinguishing treatments and mechanism of action.
翻译:高内涵成像实验能够捕获大量化合物处理下的丰富表型响应数据,有助于新药表征与发现。然而,从高内涵图像中提取能捕捉表型细微差别的代表性特征仍具挑战性。缺乏高质量标注使得监督式深度学习难以取得理想效果。自监督学习方法在自然图像上取得了显著成功,也为显微图像提供了有吸引力的替代方案。但我们发现,自监督学习技术在高内涵成像实验中表现欠佳,其中一大挑战是数据中存在的非理想域偏移(即批次效应),这由生物噪声或不可控的实验条件引发。为此,我们提出跨域一致性学习(CDCL),一种能在批次效应下进行学习的自监督方法。CDCL强制学习生物相似性,同时忽略非理想的批次特定信号,从而获得更实用且更具泛化性的表征。这些特征按其形态变化规律组织,对下游任务(如区分处理方式与作用机制)更具实用性。