We study the problem of inferring substitutable and complementary items, which underpins applications such as alternative and follow-up purchase suggestions. Existing approaches typically learn from behavior-derived item-item associations using GNNs or leverage item content alone. However, these methods often overlook two key challenges: (i) user behaviors (e.g., co-view/co-purchase) only provide noisy weak supervision, and (ii) behavior signals are long-tailed, leaving many items with sparse associations. We propose MMSC, a self-supervised multi-modal relational representation learning framework that combines a multi-modal foundation model adapted to encode item metadata and a self-supervised denoising module that learns relationship-aware representations from noisy user behaviors, unified by a hierarchical aggregation mechanism. We further use LLM-assisted supervision to mitigate noise in behavior-derived supervision during training. Experiments on five real-world datasets show that MMSC consistently outperforms existing baselines by 26.1% for substitutable and 39.2% for complementary item inference, while remaining effective for cold-start items. We share our code for reproducibility.
翻译:我们研究替代性与互补性商品的推断问题,该问题支撑着替代推荐与追加购买建议等应用。现有方法通常利用图神经网络从行为派生的物品-物品关联中进行学习,或仅利用物品内容特征。然而,这些方法往往忽视两个关键挑战:(i)用户行为(如共同浏览/共同购买)仅提供含噪的弱监督信号;(ii)行为信号呈长尾分布,导致许多物品的关联关系稀疏。我们提出MMSC框架——一种自监督多模态关系表征学习方法,该方法结合了适配编码物品元数据的多模态基础模型,以及从含噪用户行为中学习关系感知表征的自监督去噪模块,并通过层次化聚合机制实现两者的统一。此外,我们采用大语言模型辅助的监督信号来缓解训练过程中行为派生监督的噪声问题。在五个真实数据集上的实验表明,MMSC在替代品和互补品推断任务中分别以26.1%和39.2%的平均性能提升持续优于现有基线方法,同时对冷启动物品仍保持有效。我们公开代码以支持结果复现。