Differentially private (DP) contrastive learning aims to learn general-purpose representations from sensitive data, alleviating the privacy leakage concerns of organizations deploying or sharing embedding models trained on private user content. However, existing approaches suffer from severe utility degradation due to the over-strong inter-sample dependency inherent in standard contrastive objectives, where each sample's gradient depends on all other samples in the batch, amplifying the impact of DP noise. In this work, we argue that effective DP contrastive learning requires explicitly reducing such intrinsic inter-sample reliance. To this end, we propose DP-GCL, a principled DP contrastive learning framework that structurally limits gradient dependency through bounding group-level contribution. DP-GCL partitions each batch into small, disjoint groups and restricts available negative samples to within-group samples, thereby localizing gradient influence and reducing sensitivity. To counteract the resulting loss of negative sample diversity, we further introduce intra-group augmentation, which generates additional negative views without increasing privacy cost. Extensive experiments across eight datasets demonstrate that DP-GCL consistently advances the state of the art in both uni-modal and multi-modal contrastive learning under practical privacy budgets: it improves image classification accuracy by 5.6% and image-text retrieval accuracy by 20.1% over existing DP contrastive methods.
翻译:差分隐私对比学习旨在从敏感数据中学习通用表征,从而缓解组织在部署或共享基于用户隐私内容训练的嵌入模型时面临的隐私泄露问题。然而,现有方法因标准对比目标中固有的过强样本间依赖性而遭受严重的效用退化——每个样本的梯度依赖于批处理中的所有其他样本,放大了差分隐私噪声的影响。本研究提出,有效的差分隐私对比学习需要显式降低此类内在的样本间依赖。为此,我们提出DP-GCL,一种通过约束分组贡献来结构性地限制梯度依赖性的原则性差分隐私对比学习框架。DP-GCL将每个批次划分为多个互不相交的小组,并将可用负样本限制在组内样本,从而定位梯度影响并降低灵敏度。为抵消由此导致的负样本多样性损失,我们进一步引入组内增强技术,在不增加隐私成本的情况下生成额外负视图。跨八个数据集的广泛实验表明,DP-GCL在实际隐私预算下持续推动单模态与多模态对比学习的最新进展:与现有差分隐私对比方法相比,图像分类准确率提升5.6%,图像-文本检索准确率提升20.1%。