Recommendation with side information has drawn significant research interest due to its potential to mitigate user feedback sparsity. However, existing models struggle with generalization across diverse domains and types of side information. In particular, three challenges have not been addressed, and they are (1) the diverse formats of side information, including text sequences. (2) The diverse semantics of side information that describes items and users from multi-level in a context different from recommendation systems. (3) The diverse correlations in side information to measure similarity over multiple objects beyond pairwise relations. In this paper, we introduce GENET (Generalized hypErgraph pretraiNing on sidE informaTion), which pre-trains user and item representations on feedback-irrelevant side information and fine-tunes the representations on user feedback data. GENET leverages pre-training as a means to prevent side information from overshadowing critical ID features and feedback signals. It employs a hypergraph framework to accommodate various types of diverse side information. During pre-training, GENET integrates tasks for hyperlink prediction and self-supervised contrast to capture fine-grained semantics at both local and global levels. Additionally, it introduces a unique strategy to enhance pre-training robustness by perturbing positive samples while maintaining high-order relations. Extensive experiments demonstrate that GENET exhibits strong generalization capabilities, outperforming the SOTA method by up to 38% in TOP-N recommendation and Sequential recommendation tasks on various datasets with different side information.
翻译:带侧信息的推荐系统因具有缓解用户反馈稀疏性的潜力而受到广泛研究关注。然而,现有模型在跨不同领域和侧信息类型进行泛化时面临挑战。具体而言,三个问题尚未得到解决:(1)侧信息的多样化格式,包括文本序列;(2)侧信息描述项目和用户的多层次语义,这些语义产生于与推荐系统不同的情境中;(3)侧信息中用于衡量多个对象间相似性的多样化关联,超越了成对关系。本文提出GENET(通用侧信息超图预训练),该方法在与反馈无关的侧信息上预训练用户和项目表示,并在用户反馈数据上微调这些表示。GENET利用预训练作为手段,防止侧信息掩盖关键ID特征和反馈信号。它采用超图框架以适应多种类型的多样化侧信息。在预训练过程中,GENET整合了超链接预测与自监督对比任务,以捕获局部和全局层面的细粒度语义。此外,它引入了一种独特策略,通过扰动正样本同时保持高阶关系来增强预训练鲁棒性。大量实验表明,GENET展现出强大的泛化能力,在具有不同侧信息的多种数据集上,其TOP-N推荐和序列推荐任务性能优于现有最优方法(SOTA)高达38%。