Beyond Instance-Level Alignment and Uniformity: Semantic Factor Learning for Collaborative Filtering

Collaborative filtering (CF) is widely used in recommender systems (RecSys) due to its simplicity and efficiency. However, existing CF methods follow an instance-level learning paradigm. During the instance learning stage, a large number of uninteracted user-item instances, of which items are potential interested by the user, are incorrectly treated as true negative samples resulting in a severe limitation to the generalization and scalability of models. Moreover, mainstream graph convolutional networks (GCNs) inherently suffer from high computational cost and over-smoothing issues, which limit the ability in capturing higher-order connectivity and lead to a poor generalization under sparse supervision signals. To address the above limitations, we propose Semantic Factor enhanced Alignment and Uniformity (SaFeAU), a novel framework that augments interacted instances with semantic factors, thereby mitigating false negative labeling and enabling matrix factorization (MF) to capture high-order CF signals without graph neighborhood aggregation. Specifically, SaFeAU consists of three tightly coupled components. First, Semantic Factor Routing (SFR) disentangles item representations into independent and global semantic factors. Building on these factors, Semantic Factor Matching (SFM) identifies uninteracted items, which share the same semantic factors with interacted ones, as potential positive pairs for enriching sparse supervision signals. Finally, Semantic Pairs Alignment (SPA) aligns both observed and potential positive pairs while promoting uniformity of user and item representations. Extensive experiments on four sparse real-world datasets show that SaFeAU consistently outperforms GCN-based and MF-based state-of-the-art CF methods in both recommendation accuracy and computational efficiency, confirming the effectiveness of the proposed semantic enhanced learning paradigm.

翻译：协同过滤（CF）因其简洁高效被广泛应用于推荐系统（RecSys）。然而，现有CF方法遵循实例级学习范式。在实例学习阶段，大量用户未交互的物品实例——这些物品可能被用户潜在感兴趣——被错误地视为真实负样本，严重限制了模型的泛化能力与可扩展性。此外，主流图卷积网络（GCNs）固有问题包括高计算开销与过平滑现象，这削弱了其捕捉高阶连通性的能力，导致在稀疏监督信号下泛化性能不足。针对上述局限，我们提出语义因子增强的对齐与均匀性（SaFeAU）框架——一种通过语义因子增强交互实例的新颖方法，从而缓解伪负标记问题，并使得矩阵分解（MF）无需图邻域聚合即可捕捉高阶协同过滤信号。具体而言，SaFeAU由三个紧密耦合的组件构成。首先，语义因子路由（SFR）将物品表征解耦为独立且全局性的语义因子；在此基础上，语义因子匹配（SFM）将未交互物品中与被交互物品共享相同语义因子的部分识别为潜在正样本对，以丰富稀疏监督信号；最后，语义配对对齐（SPA）在促进用户与物品表征均匀性的同时，对齐显式正样本对与潜在正样本对。在四个稀疏真实数据集上的大量实验表明，SaFeAU在推荐准确性与计算效率两方面均持续优于基于GCN与基于MF的最先进CF方法，验证了所提出语义增强学习范式的有效性。