Learning contrastive representations from pairwise comparisons has achieved remarkable success in various fields, such as natural language processing, computer vision, and information retrieval. Collaborative filtering algorithms based on pairwise learning also rooted in this paradigm. A significant concern is the absence of labels for negative instances in implicit feedback data, which often results in the random selected negative instances contains false negatives and inevitably, biased embeddings. To address this issue, we introduce a novel correction method for sampling bias that yields a modified loss for pairwise learning called debiased pairwise loss (DPL). The key idea underlying DPL is to correct the biased probability estimates that result from false negatives, thereby correcting the gradients to approximate those of fully supervised data. The implementation of DPL only requires a small modification of the codes. Experimental studies on five public datasets validate the effectiveness of proposed learning method.
翻译:基于成对比较学习对比表示已在自然语言处理、计算机视觉和信息检索等多个领域取得显著成功。基于成对学习的协同过滤算法同样源于这一范式。一个关键问题在于隐式反馈数据中负实例标签的缺失,这往往导致随机选取的负实例包含假阴性,并不可避免地产生有偏的嵌入表示。为解决此问题,我们提出了一种针对采样偏差的新型校正方法,并推导出适用于成对学习的修正损失函数——去偏成对损失(DPL)。DPL的核心思想是通过校正由假阴性导致的偏概率估计,从而修正梯度以逼近全监督数据下的梯度。DPL的实现只需对代码进行少量修改。在五个公开数据集上的实验验证了所提学习方法的有效性。