Session-based recommendation aims to predict intents of anonymous users based on their limited behaviors. Modeling user behaviors involves two distinct rationales: co-occurrence patterns reflected by item IDs, and fine-grained preferences represented by item modalities (e.g., text and images). However, existing methods typically entangle these causes, leading to their failure in achieving accurate and explainable recommendations. To this end, we propose a novel framework DIMO to disentangle the effects of ID and modality in the task. At the item level, we introduce a co-occurrence representation schema to explicitly incorporate cooccurrence patterns into ID representations. Simultaneously, DIMO aligns different modalities into a unified semantic space to represent them uniformly. At the session level, we present a multi-view self-supervised disentanglement, including proxy mechanism and counterfactual inference, to disentangle ID and modality effects without supervised signals. Leveraging these disentangled causes, DIMO provides recommendations via causal inference and further creates two templates for generating explanations. Extensive experiments on multiple real-world datasets demonstrate the consistent superiority of DIMO over existing methods. Further analysis also confirms DIMO's effectiveness in generating explanations.
翻译:基于会话的推荐旨在根据匿名用户的有限行为预测其意图。建模用户行为涉及两种不同的原理:由商品ID反映的共现模式,以及由商品模态(如文本和图像)表示的细粒度偏好。然而,现有方法通常将这些成因纠缠在一起,导致其无法实现准确且可解释的推荐。为此,我们提出了一种新颖框架DIMO,用于解耦该任务中ID与模态的效应。在商品层面,我们引入了一种共现表示模式,将共现模式显式地纳入ID表示中;同时,DIMO将不同模态对齐至统一的语义空间以实现统一表示。在会话层面,我们提出了一种多视角自监督解耦方法,包括代理机制和反事实推断,以在无需监督信号的情况下解耦ID与模态效应。利用这些解耦后的成因,DIMO通过因果推断提供推荐,并进一步创建两种生成解释的模板。在多个真实世界数据集上的广泛实验表明,DIMO相较于现有方法具有一致优越性。进一步分析也证实了DIMO在生成解释方面的有效性。