Group Contrastive Learning for Weakly Paired Multimodal Data

We present GROOVE, a semi-supervised multi-modal representation learning approach for high-content perturbation data where samples across modalities are weakly paired through shared perturbation labels but lack direct correspondence. Our primary contribution is GroupCLIP, a novel group-level contrastive loss that bridges the gap between CLIP for paired cross-modal data and SupCon for uni-modal supervised contrastive learning, addressing a fundamental gap in contrastive learning for weakly-paired settings. We integrate GroupCLIP with an on-the-fly backtranslating autoencoder framework to encourage cross-modally entangled representations while maintaining group-level coherence within a shared latent space. Critically, we introduce a comprehensive combinatorial evaluation framework that systematically assesses representation learners across multiple optimal transport aligners, addressing key limitations in existing evaluation strategies. This framework includes novel simulations that systematically vary shared versus modality-specific perturbation effects enabling principled assessment of method robustness. Our combinatorial benchmarking reveals that there is not yet an aligner that uniformly dominates across settings or modality pairs. Across simulations and two real single-cell genetic perturbation datasets, GROOVE performs on par with or outperforms existing approaches for downstream cross-modal matching and imputation tasks. Our ablation studies demonstrate that GroupCLIP is the key component driving performance gains. These results highlight the importance of leveraging group-level constraints for effective multi-modal representation learning in scenarios where only weak pairing is available.

翻译：我们提出了GROOVE，一种针对高内涵扰动数据的半监督多模态表示学习方法，其中跨模态样本通过共享的扰动标签弱配对，但缺乏直接对应关系。我们的主要贡献是GroupCLIP，这是一种新颖的组级对比损失函数，它弥合了用于配对跨模态数据的CLIP与用于单模态监督对比学习的SupCon之间的差距，解决了弱配对场景下对比学习的一个根本性空白。我们将GroupCLIP与动态回译自编码器框架相结合，以促进跨模态纠缠表示，同时在共享潜在空间内保持组级一致性。关键的是，我们引入了一个全面的组合评估框架，通过多种最优传输对齐器系统评估表示学习器，解决了现有评估策略的关键局限性。该框架包含新颖的模拟实验，系统性地改变共享与模态特异性扰动效应，从而实现对方法鲁棒性的原则性评估。我们的组合基准测试表明，目前尚不存在能够在所有设置或模态对中均占优的对齐器。在模拟实验和两个真实单细胞遗传扰动数据集中，GROOVE在下游跨模态匹配与插补任务上的表现与现有方法相当或更优。我们的消融研究表明，GroupCLIP是驱动性能提升的关键组件。这些结果凸显了在仅能获得弱配对的情况下，利用组级约束对于实现有效的多模态表示学习的重要性。