Contrastive-learning-based methods have dominated sentence representation learning. These methods regularize the representation space by pulling similar sentence representations closer and pushing away the dissimilar ones and have been proven effective in various NLP tasks, e.g., semantic textual similarity (STS) tasks. However, it is challenging for these methods to learn fine-grained semantics as they only learn from the inter-sentence perspective, i.e., their supervision signal comes from the relationship between data samples. In this work, we propose a novel denoising objective that inherits from another perspective, i.e., the intra-sentence perspective. By introducing both discrete and continuous noise, we generate noisy sentences and then train our model to restore them to their original form. Our empirical evaluations demonstrate that this approach delivers competitive results on both semantic textual similarity (STS) and a wide range of transfer tasks, standing up well in comparison to contrastive-learning-based methods. Notably, the proposed intra-sentence denoising objective complements existing inter-sentence contrastive methodologies and can be integrated with them to further enhance performance. Our code is available at https://github.com/xinghaow99/DenoSent.
翻译:基于对比学习的方法在句子表示学习中占据主导地位。这些方法通过拉近相似句子表示、推开不相似表示来规约表示空间,已被证明在各类自然语言处理任务(如语义文本相似度任务)中效果显著。然而,这类方法仅从句子间视角(即监督信号源于数据样本间的关系)学习,难以捕捉细粒度语义。本文从另一种视角——句子内视角——提出一种新颖的去噪目标。通过引入离散与连续噪声,我们生成带噪句子,并训练模型将其恢复至原始形式。实验评估表明,该方法在语义文本相似度任务及广泛的迁移任务上均能取得具有竞争力的结果,可与基于对比学习的方法相媲美。值得注意的是,所提出的句子内去噪目标与现有的句子间对比方法互为补充,可与之结合以进一步提升性能。我们的代码已开源至 https://github.com/xinghaow99/DenoSent。