While the mining of modalities is the focus of most multimodal recommendation methods, we believe that how to fully utilize both collaborative and multimodal information is pivotal in e-commerce scenarios where, as clarified in this work, the user behaviors are rarely determined entirely by multimodal features. In order to combine the two distinct types of information, some additional challenges are encountered: 1) Modality erasure: Vanilla graph convolution, which proves rather useful in collaborative filtering, however erases multimodal information; 2) Modality forgetting: Multimodal information tends to be gradually forgotten as the recommendation loss essentially facilitates the learning of collaborative information. To this end, we propose a novel approach named STAIR, which employs a novel STepwise grAph convolution to enable a co-existence of collaborative and multimodal Information in e-commerce Recommendation. Besides, it starts with the raw multimodal features as an initialization, and the forgetting problem can be significantly alleviated through constrained embedding updates. As a result, STAIR achieves state-of-the-art recommendation performance on three public e-commerce datasets with minimal computational and memory costs. Our code is available at https://github.com/yhhe2004/STAIR.
翻译:尽管大多数多模态推荐方法的核心在于挖掘多模态信息,但我们认为在电子商务场景中,如何充分利用协同信息与多模态信息同样至关重要。正如本文所阐明,用户行为很少完全由多模态特征决定。为融合这两种不同类型的信息,研究面临以下挑战:1)模态擦除:在协同过滤中被证明有效的传统图卷积操作会消除多模态信息;2)模态遗忘:由于推荐损失本质上促进协同信息的学习,多模态信息在训练过程中易被逐渐遗忘。为此,我们提出名为STAIR的新方法,通过创新的逐步图卷积技术,在电子商务推荐中实现协同信息与多模态信息的共存。该方法以原始多模态特征作为初始化,并通过约束嵌入更新显著缓解模态遗忘问题。实验表明,STAIR在三个公开电子商务数据集上以极低的计算和内存成本实现了最先进的推荐性能。代码已开源:https://github.com/yhhe2004/STAIR。