Multimedia content is of predominance in the modern Web era. In real scenarios, multiple modalities reveal different aspects of item attributes and usually possess different importance to user purchase decisions. However, it is difficult for models to figure out users' true preference towards different modalities since there exists strong statistical correlation between modalities. Even worse, the strong statistical correlation might mislead models to learn the spurious preference towards inconsequential modalities. As a result, when data (modal features) distribution shifts, the learned spurious preference might not guarantee to be as effective on the inference set as on the training set. We propose a novel MOdality DEcorrelating STable learning framework, MODEST for brevity, to learn users' stable preference. Inspired by sample re-weighting techniques, the proposed method aims to estimate a weight for each item, such that the features from different modalities in the weighted distribution are decorrelated. We adopt Hilbert Schmidt Independence Criterion (HSIC) as independence testing measure which is a kernel-based method capable of evaluating the correlation degree between two multi-dimensional and non-linear variables. Our method could be served as a play-and-plug module for existing multimedia recommendation backbones. Extensive experiments on four public datasets and four state-of-the-art multimedia recommendation backbones unequivocally show that our proposed method can improve the performances by a large margin.
翻译:多媒体内容在现代网络时代占据主导地位。在实际场景中,多种模态揭示了物品属性的不同方面,并且通常对用户购买决策具有不同的重要性。然而,由于模态之间存在强统计相关性,模型难以准确捕捉用户对不同模态的真实偏好。更严重的是,这种强统计相关性可能误导模型学习到对次要模态的虚假偏好。因此,当数据(模态特征)分布发生偏移时,所学习的虚假偏好无法保证在推理集上与训练集同样有效。我们提出了一种新颖的模态去相关稳定学习框架(简称MODEST),用于学习用户的稳定偏好。受样本重加权技术的启发,该方法旨在为每个物品估计一个权重,使得加权分布下不同模态的特征之间实现去相关。我们采用希尔伯特-施密特独立性准则(HSIC)作为独立性检验度量,该准则是一种基于核的方法,能够评估多维非线性变量之间的相关程度。我们的方法可作为即插即用模块嵌入现有多媒体推荐主干网络中。在四个公开数据集和四种最先进的多媒体推荐主干网络上进行的大量实验明确表明,所提方法能大幅提升推荐性能。