Recommenders built upon implicit collaborative filtering are typically trained to distinguish between users' positive and negative preferences. When direct observations of the latter are unavailable, negative training data are constructed with sampling techniques. But since items often exhibit clustering in the latent space, existing methods tend to oversample negatives from dense regions, resulting in homogeneous training data and limited model expressiveness. To address these shortcomings, we propose a novel negative sampler with diversity guarantees. To achieve them, our approach first pairs each positive item of a user with one that they have not yet interacted with; this instance, called hard negative, is chosen as the top-scoring item according to the model. Instead of discarding the remaining highly informative items, we store them in a user-specific cache. Next, our diversity-augmented sampler selects a representative subset of negatives from the cache, ensuring its dissimilarity from the corresponding user's hard negatives. Our generator then combines these items with the hard negatives, replacing them to produce more effective (synthetic) negative training data that are informative and diverse. Experiments show that our method consistently leads to superior recommendation quality without sacrificing computational efficiency.
翻译:基于隐式协同过滤的推荐系统通常通过区分用户的正向偏好与负向偏好进行训练。当无法直接观测后者时,负向训练数据需通过采样技术构建。然而,由于物品在隐空间中常呈现聚类特性,现有方法倾向于从密集区域过采样负例,导致训练数据同质化且模型表达能力受限。为克服这些缺陷,我们提出一种具有多样性保证的新型负采样器。为实现该目标,我们的方法首先将用户的每个正向物品与一个其尚未交互的物品进行配对;该实例称为硬负例,根据模型评分被选为得分最高的物品。我们并未丢弃其余高信息量的物品,而是将其存储于用户特定的缓存中。接着,我们的多样性增强采样器从缓存中选择具有代表性的负例子集,确保其与相应用户的硬负例保持差异性。最后,我们的生成器将这些物品与硬负例结合,通过替换操作生成信息量更丰富、多样性更强的有效(合成)负向训练数据。实验表明,该方法在保持计算效率的同时,能持续提升推荐质量。