Collaborative filtering (CF) is a widely employed technique that predicts user preferences based on past interactions. Negative sampling plays a vital role in training CF-based models with implicit feedback. In this paper, we propose a novel perspective based on the sampling area to revisit existing sampling methods. We point out that current sampling methods mainly focus on Point-wise or Line-wise sampling, lacking flexibility and leaving a significant portion of the hard sampling area un-explored. To address this limitation, we propose Dimension Independent Mixup for Hard Negative Sampling (DINS), which is the first Area-wise sampling method for training CF-based models. DINS comprises three modules: Hard Boundary Definition, Dimension Independent Mixup, and Multi-hop Pooling. Experiments with real-world datasets on both matrix factorization and graph-based models demonstrate that DINS outperforms other negative sampling methods, establishing its effectiveness and superiority. Our work contributes a new perspective, introduces Area-wise sampling, and presents DINS as a novel approach that achieves state-of-the-art performance for negative sampling. Our implementations are available in PyTorch.
翻译:协同过滤(CF)是一种基于历史交互预测用户偏好的广泛应用技术。负采样在基于隐式反馈训练CF模型中发挥着关键作用。本文从采样区域的全新视角重新审视现有采样方法,指出现有方法主要聚焦于点状或线状采样,缺乏灵活性且未充分探索大部分困难采样区域。为突破这一局限,我们提出了面向困难负采样的维度无关混合方法(DINS),这是首个用于训练CF模型的区域采样方法。DINS包含三个模块:硬边界定义、维度无关混合与多跳池化。基于矩阵分解与图模型的真实数据集实验表明,DINS优于其他负采样方法,验证了其有效性与优越性。本研究贡献了新视角、引入了区域采样概念,并提出了达到当前最优性能的负采样方法DINS。我们的实现基于PyTorch。