ContriMix: Unsupervised disentanglement of content and attribute for domain generalization in microscopy image analysis

Tan H. Nguyen,Dinkar Juyal,Jin Li,Aaditya Prakash,Shima Nofallah,Chintan Shah,Sai Chowdary Gullapally,Limin Yu,Michael Griffin,Anand Sampat,John Abel,Justin Lee,Amaro Taylor-Weiner

Domain generalization is critical for real-world applications of machine learning to microscopy images, including histopathology and fluorescence imaging. Artifacts in these modalities arise through a complex combination of factors relating to tissue collection and laboratory processing, as well as factors intrinsic to patient samples. In fluorescence imaging, these artifacts stem from variations across experimental batches. The complexity and subtlety of these artifacts make the enumeration of data domains intractable. Therefore, augmentation-based methods of domain generalization that require domain identifiers and manual fine-tuning are inadequate in this setting. To overcome this challenge, we introduce ContriMix, a domain generalization technique that learns to generate synthetic images by disentangling and permuting the biological content ("content") and technical variations ("attributes") in microscopy images. ContriMix does not rely on domain identifiers or handcrafted augmentations and makes no assumptions about the input characteristics of images. We assess the performance of ContriMix on two pathology datasets dealing with patch classification and Whole Slide Image label prediction tasks respectively (Camelyon17-WILDS and RCC subtyping), and one fluorescence microscopy dataset (RxRx1-WILDS). Without any access to domain identifiers at train or test time, ContriMix performs similar or better than current state-of-the-art methods in all these datasets, motivating its usage for microscopy image analysis in real-world settings where domain information is hard to come by. The code for ContriMix can be found at https://gitlab.com/huutan86/contrimix

翻译：域泛化对于机器学习在显微图像（包括组织病理学和荧光成像）中的实际应用至关重要。这些模态的伪影源于组织采集与实验室处理的复杂因素组合，以及患者样本的内在特性。在荧光成像中，这些伪影源于实验批次间的差异。这些伪影的复杂性和微妙性使得数据域的枚举变得棘手。因此，依赖域标识符和手动微调的基于增强的域泛化方法在此场景中不适用。为克服这一挑战，我们提出ContriMix——一种通过解缠和置换显微图像中的生物内容（"内容"）与技术变异（"属性"）来学习生成合成图像的域泛化技术。ContriMix不依赖域标识符或手工设计的增强策略，且不对图像输入特征做任何假设。我们评估了ContriMix在两项病理学数据集（分别涉及图像块分类和全切片图像标签预测任务：Camelyon17-WILDS和RCC亚型分类）以及一个荧光显微镜数据集（RxRx1-WILDS）上的性能。在训练或测试时无需任何域标识符的情况下，ContriMix在所有数据集上均达到与当前最先进方法相当或更优的性能，这推动了其在域信息难以获取的真实显微图像分析场景中的应用。ContriMix代码详见https://gitlab.com/huutan86/contrimix。