Whitening-based Contrastive Learning of Sentence Embeddings

This paper presents a whitening-based contrastive learning method for sentence embedding learning (WhitenedCSE), which combines contrastive learning with a novel shuffled group whitening. Generally, contrastive learning pulls distortions of a single sample (i.e., positive samples) close and push negative samples far away, correspondingly facilitating the alignment and uniformity in the feature space. A popular alternative to the "pushing'' operation is whitening the feature space, which scatters all the samples for uniformity. Since the whitening and the contrastive learning have large redundancy w.r.t. the uniformity, they are usually used separately and do not easily work together. For the first time, this paper integrates whitening into the contrastive learning scheme and facilitates two benefits. 1) Better uniformity. We find that these two approaches are not totally redundant but actually have some complementarity due to different uniformity mechanism. 2) Better alignment. We randomly divide the feature into multiple groups along the channel axis and perform whitening independently within each group. By shuffling the group division, we derive multiple distortions of a single sample and thus increase the positive sample diversity. Consequently, using multiple positive samples with enhanced diversity further improves contrastive learning due to better alignment. Extensive experiments on seven semantic textual similarity tasks show our method achieves consistent improvement over the contrastive learning baseline and sets new states of the art, e.g., 78.78\% (+2.53\% based on BERT\ba) Spearman correlation on STS tasks.

翻译：本文提出了一种基于白化的对比学习句子嵌入方法（WhitenedCSE），该方法将对比学习与新型打乱分组白化技术相结合。通常，对比学习将单个样本的扰动（即正样本）拉近，同时推远负样本，从而在特征空间中促进对齐性与均匀性。实现"推远"操作的一种常用替代方案是对特征空间进行白化，通过分散所有样本来确保均匀性。由于白化与对比学习在均匀性方面存在较大冗余，二者通常被单独使用且难以有效协同。本文首次将白化整合到对比学习框架中，并带来两方面的优势：1) 更优的均匀性。我们发现这两种方法并非完全冗余，由于均匀化机制不同，实际存在互补性。2) 更佳的对齐性。我们沿通道轴将特征随机划分为多个组，并在每组内独立执行白化。通过打乱分组方式，可生成单个样本的多个扰动版本，从而增加正样本多样性。利用具有增强多样性的多正样本进一步提升了对比学习的对齐效果。在七项语义文本相似度任务上的大量实验表明，我们的方法相较于对比学习基线取得了持续改进，并刷新了最优性能——例如在STS任务上基于BERT\ba的Spearman相关系数达到78.78%（提升+2.53%）。