ContriMix: Scalable stain color augmentation for domain generalization without domain labels in digital pathology

Tan H. Nguyen,Dinkar Juyal,Jin Li,Aaditya Prakash,Shima Nofallah,Chintan Shah,Sai Chowdary Gullapally,Limin Yu,Michael Griffin,Anand Sampat,John Abel,Justin Lee,Amaro Taylor-Weiner

Differences in staining and imaging procedures can cause significant color variations in histopathology images, leading to poor generalization when deploying deep-learning models trained from a different data source. Various color augmentation methods have been proposed to generate synthetic images during training to make models more robust, eliminating the need for stain normalization during test time. Many color augmentation methods leverage domain labels to generate synthetic images. This approach causes three significant challenges to scaling such a model. Firstly, incorporating data from a new domain into deep-learning models trained on existing domain labels is not straightforward. Secondly, dependency on domain labels prevents the use of pathology images without domain labels to improve model performance. Finally, implementation of these methods becomes complicated when multiple domain labels (e.g., patient identification, medical center, etc) are associated with a single image. We introduce ContriMix, a novel domain label free stain color augmentation method based on DRIT++, a style-transfer method. Contrimix leverages sample stain color variation within a training minibatch and random mixing to extract content and attribute information from pathology images. This information can be used by a trained ContriMix model to create synthetic images to improve the performance of existing classifiers. ContriMix outperforms competing methods on the Camelyon17-WILDS dataset. Its performance is consistent across different slides in the test set while being robust to the color variation from rare substances in pathology images. We make our code and trained ContriMix models available for research use. The code for ContriMix can be found at https://gitlab.com/huutan86/contrimix

翻译：染色和成像流程的差异会导致组织病理学图像出现显著的颜色变异，当部署从不同数据源训练的深度学习模型时，会导致泛化能力下降。已有多种颜色增强方法在训练过程中生成合成图像以提高模型鲁棒性，从而消除测试时的染色归一化需求。许多颜色增强方法利用域标签生成合成图像，但这种方法在模型扩展时面临三大挑战：首先，将新域数据整合到基于现有域标签训练的深度学习模型中较为困难；其次，对域标签的依赖阻碍了利用无域标签的病理图像提升模型性能；最后，当单张图像关联多个域标签（如患者标识、医疗中心等）时，这些方法的实现变得复杂。我们提出ContriMix——一种基于DRIT++风格迁移方法的无域标签染色颜色增强新方法。ContriMix利用训练小批次内的样本染色颜色变异及随机混合操作，从病理图像中提取内容与属性信息。通过训练后的ContriMix模型，这些信息可用于生成合成图像，从而提升现有分类器的性能。ContriMix在Camelyon17-WILDS数据集上优于对比方法，且其性能在测试集的不同切片间保持稳定，同时对病理图像中稀有物质导致的颜色变异具有鲁棒性。我们公开了代码与训练好的ContriMix模型供研究使用，代码可通过https://gitlab.com/huutan86/contrimix获取。