Batch effects arising from technical variations in histopathology staining protocols, scanners, and acquisition pipelines pose a persistent challenge for computational pathology, hindering cross-batch generalization and limiting reliable deployment of models across clinical sites. In this work, we introduce Latent Manifold Compaction (LMC), an unsupervised representation learning framework that performs image harmonization by learning batch-invariant embeddings from a single source dataset through explicit compaction of stain-induced latent manifolds. This allows LMC to generalize to target domain data unseen during training. Evaluated on three challenging public and in-house benchmarks, LMC substantially reduces batch-induced separations across multiple datasets and consistently outperforms state-of-the-art normalization methods in downstream cross-batch classification and detection tasks, enabling superior generalization.
翻译:组织病理学染色方案、扫描仪及采集流程中的技术差异导致的批次效应,是计算病理学领域持续面临的挑战,其阻碍了跨批次泛化能力,并限制了模型在不同临床机构间的可靠部署。本研究提出潜在流形压缩(LMC),一种无监督表征学习框架,通过显式压缩染色诱导的潜在流形,从单一源数据集中学习批次不变嵌入,从而实现图像协调。这使得LMC能够泛化至训练期间未见的目标域数据。在三个具有挑战性的公开及内部基准测试上的评估表明,LMC显著降低了多个数据集间的批次诱导分离,并在下游跨批次分类与检测任务中持续优于当前最先进的标准化方法,实现了更优的泛化性能。