A Topology-Aware Positive Sample Set Construction and Feature Optimization Method in Implicit Collaborative Filtering

Negative sampling strategies are widely used in implicit collaborative filtering to address issues like data sparsity and class imbalance. However, these methods often introduce false negatives, hindering the model's ability to accurately learn users' latent preferences. To mitigate this problem, existing methods adjust the negative sampling distribution based on statistical features from model training or the hardness of negative samples. Nevertheless, these methods face two key limitations: (1) over-reliance on the model's current representation capabilities; (2) failure to leverage the potential of false negatives as latent positive samples to guide model learning of user preferences more accurately. To address the above issues, we propose a Topology-aware Positive Sample Set Construction and Feature Optimization method (TPSC-FO). First, we design a simple topological community-aware false negative identification (FNI) method and observe that topological community structures in interaction networks can effectively identify false negatives. Motivated by this, we develop a topology-aware positive sample set construction module. This module employs a differential community detection strategy to capture topological community structures in implicit feedback, coupled with personalized noise filtration to reliably identify false negatives and convert them into positive samples. Additionally, we introduce a neighborhood-guided feature optimization module that refines positive sample features by incorporating neighborhood features in the embedding space, effectively mitigating noise in the positive samples. Extensive experiments on five real-world datasets and two synthetic datasets validate the effectiveness of TPSC-FO.

翻译：负采样策略在隐式协同过滤中被广泛用于解决数据稀疏性和类别不平衡等问题。然而，这些方法常常会引入假阴性样本，阻碍模型准确学习用户的潜在偏好。为了缓解这一问题，现有方法基于模型训练的统计特征或负样本的难易程度来调整负采样分布。尽管如此，这些方法面临两个关键局限：(1) 过度依赖模型当前的表示能力；(2) 未能利用假阴性样本作为潜在正样本来更准确地指导模型学习用户偏好的潜力。针对上述问题，我们提出了一种拓扑感知的正样本集构建与特征优化方法（TPSC-FO）。首先，我们设计了一种简单的拓扑社区感知假阴性识别方法，并观察到交互网络中的拓扑社区结构可以有效识别假阴性样本。受此启发，我们开发了一个拓扑感知的正样本集构建模块。该模块采用差异化的社区检测策略来捕获隐式反馈中的拓扑社区结构，并结合个性化噪声过滤，以可靠地识别假阴性样本并将其转化为正样本。此外，我们引入了一个邻域引导的特征优化模块，该模块通过在嵌入空间中融入邻域特征来精炼正样本特征，有效缓解了正样本中的噪声。在五个真实世界数据集和两个合成数据集上进行的大量实验验证了TPSC-FO的有效性。