Negative sampling is essential for implicit-feedback-based collaborative filtering, which is used to constitute negative signals from massive unlabeled data to guide supervised learning. The state-of-the-art idea is to utilize hard negative samples that carry more useful information to form a better decision boundary. To balance efficiency and effectiveness, the vast majority of existing methods follow the two-pass approach, in which the first pass samples a fixed number of unobserved items by a simple static distribution and then the second pass selects the final negative items using a more sophisticated negative sampling strategy. However, selecting negative samples from the original items is inherently restricted, and thus may not be able to contrast positive samples well. In this paper, we confirm this observation via experiments and introduce two limitations of existing solutions: ambiguous trap and information discrimination. Our response to such limitations is to introduce augmented negative samples. This direction renders a substantial technical challenge because constructing unconstrained negative samples may introduce excessive noise that distorts the decision boundary. To this end, we introduce a novel generic augmented negative sampling paradigm and provide a concrete instantiation. First, we disentangle hard and easy factors of negative items. Next, we generate new candidate negative samples by augmenting only the easy factors in a regulated manner: the direction and magnitude of the augmentation are carefully calibrated. Finally, we design an advanced negative sampling strategy to identify the final augmented negative samples, which considers not only the score function used in existing methods but also a new metric called augmentation gain. Extensive experiments on real-world datasets demonstrate that our method significantly outperforms state-of-the-art baselines.
翻译:负采样对于基于隐式反馈的协同过滤至关重要,它用于从海量无标签数据中构建负信号以指导监督学习。当前最先进的思想是利用携带更多有用信息的困难负样本来形成更优的决策边界。为了平衡效率与效果,现有绝大多数方法采用两阶段策略:第一阶段通过简单静态分布采样固定数量的未观测项,第二阶段使用更复杂的负采样策略筛选最终负样本。然而,从原始项中选取负样本本质存在局限性,可能难以有效对比正样本。本文通过实验验证这一观察结果,并指出现有方法的两大局限:模糊陷阱与信息判别不足。针对这些局限,我们提出引入增强负样本。该方向面临严峻技术挑战,因为构造无约束的负样本可能引入过量噪声,导致决策边界扭曲。为此,我们提出一种新颖的通用增强负采样范式,并给出具体实现。首先,我们解耦负样本的困难因子与简单因子。其次,通过仅对简单因子进行受控增强生成新候选负样本:增强的方向与幅度经过精心校准。最后,我们设计了一种先进负采样策略来筛选最终增强负样本,该策略不仅考虑现有方法采用的评分函数,还引入称为增强增益的新指标。在真实数据集上的大量实验表明,我们的方法显著优于基线模型。