Although multi-interest recommenders have achieved significant progress in the matching stage, our research reveals that existing models tend to exhibit an under-clustered item embedding space, which leads to a low discernibility between items and hampers item retrieval. This highlights the necessity for item embedding enhancement. However, item attributes, which serve as effective and straightforward side information for enhancement, are either unavailable or incomplete in many public datasets due to the labor-intensive nature of manual annotation tasks. This dilemma raises two meaningful questions: 1. Can we bypass manual annotation and directly simulate complete attribute information from the interaction data? And 2. If feasible, how to simulate attributes with high accuracy and low complexity in the matching stage? In this paper, we first establish an inspiring theoretical feasibility that the item-attribute correlation matrix can be approximated through elementary transformations on the item co-occurrence matrix. Then based on formula derivation, we propose a simple yet effective module, SimEmb (Item Embedding Enhancement via Simulated Attribute), in the multi-interest recommendation of the matching stage to implement our findings. By simulating attributes with the co-occurrence matrix, SimEmb discards the item ID-based embedding and employs the attribute-weighted summation for item embedding enhancement. Comprehensive experiments on four benchmark datasets demonstrate that our approach notably enhances the clustering of item embedding and significantly outperforms SOTA models with an average improvement of 25.59% on Recall@20.
翻译:尽管多兴趣推荐在匹配阶段取得了显著进展,我们的研究表明现有模型往往呈现欠聚类的物品嵌入空间,导致物品间区分度低并阻碍物品检索。这凸显了物品嵌入增强的必要性。然而,作为有效且直接的增强辅助信息,物品属性在众多公开数据集中由于人工标注任务的劳动密集性而不可得或不完整。这一困境引出了两个有意义的问题:1. 我们能否绕过人工标注,直接从交互数据中模拟出完整的属性信息?2. 若可行,如何在匹配阶段以高精度和低复杂度模拟属性?在本文中,我们首先建立了鼓舞人心的理论可行性:物品-属性关联矩阵可通过物品共现矩阵的基本变换来近似逼近。随后基于公式推导,我们在匹配阶段的多兴趣推荐中提出了一个简单而有效的模块SimEmb(通过模拟属性增强物品嵌入),以实践我们的发现。通过利用共现矩阵模拟属性,SimEmb摒弃了基于物品ID的嵌入,转而采用属性加权求和来实现物品嵌入增强。在四个基准数据集上的综合实验表明,我们的方法显著增强了物品嵌入的聚类性,并在Recall@20指标上平均提升25.59%,大幅超越现有最优模型。