Whitening-based Contrastive Learning of Sentence Embeddings

This paper presents a whitening-based contrastive learning method for sentence embedding learning (WhitenedCSE), which combines contrastive learning with a novel shuffled group whitening. Generally, contrastive learning pulls distortions of a single sample (i.e., positive samples) close and push negative samples far away, correspondingly facilitating the alignment and uniformity in the feature space. A popular alternative to the "pushing'' operation is whitening the feature space, which scatters all the samples for uniformity. Since the whitening and the contrastive learning have large redundancy w.r.t. the uniformity, they are usually used separately and do not easily work together. For the first time, this paper integrates whitening into the contrastive learning scheme and facilitates two benefits. 1) Better uniformity. We find that these two approaches are not totally redundant but actually have some complementarity due to different uniformity mechanism. 2) Better alignment. We randomly divide the feature into multiple groups along the channel axis and perform whitening independently within each group. By shuffling the group division, we derive multiple distortions of a single sample and thus increase the positive sample diversity. Consequently, using multiple positive samples with enhanced diversity further improves contrastive learning due to better alignment. Extensive experiments on seven semantic textual similarity tasks show our method achieves consistent improvement over the contrastive learning baseline and sets new states of the art, e.g., 78.78\% (+2.53\% based on BERT\ba) Spearman correlation on STS tasks.

翻译：本文提出了一种基于白化的对比学习句子嵌入方法（WhitenedCSE），该方法将对比学习与一种新颖的乱序分组白化技术相结合。通常，对比学习会拉近单个样本的扭曲版本（即正样本），同时推远负样本，从而在特征空间中促进对齐性和均匀性。对于“推远”操作，一个流行的替代方案是特征空间白化，它通过分散所有样本来实现均匀性。由于白化和对比学习在均匀性方面存在较大冗余，它们通常被分开使用，难以协同工作。本文首次将白化整合到对比学习框架中，并带来两大优势：1）更好的均匀性。我们发现这两种方法并非完全冗余，由于均匀性机制不同，它们实际上存在一定的互补性。2）更好的对齐性。我们将特征沿通道轴随机划分为多个分组，并在每个分组内独立进行白化。通过打乱分组划分，我们为单个样本生成多个扭曲版本，从而增加正样本多样性。因此，具有增强多样性的多正样本进一步提升了对比学习的对齐效果。在七个语义文本相似度任务上的大量实验表明，我们的方法在对比学习基线基础上实现了一致性提升，并取得了新的最优结果，例如在STS任务上Spearman相关系数达到78.78%（基于BERT\ba提升2.53%）。