Existing model-based interactive recommendation systems are trained by querying a world model to capture the user preference, but learning the world model from historical logged data will easily suffer from bias issues such as popularity bias and sampling bias. This is why some debiased methods have been proposed recently. However, two essential drawbacks still remain: 1) ignoring the dynamics of the time-varying popularity results in a false reweighting of items. 2) taking the unknown samples as negative samples in negative sampling results in the sampling bias. To overcome these two drawbacks, we develop a model called \textbf{i}dentifiable \textbf{D}ebiased \textbf{M}odel-based \textbf{I}nteractive \textbf{R}ecommendation (\textbf{iDMIR} in short). In iDMIR, for the first drawback, we devise a debiased causal world model based on the causal mechanism of the time-varying recommendation generation process with identification guarantees; for the second drawback, we devise a debiased contrastive policy, which coincides with the debiased contrastive learning and avoids sampling bias. Moreover, we demonstrate that the proposed method not only outperforms several latest interactive recommendation algorithms but also enjoys diverse recommendation performance.
翻译:现有的基于模型的交互式推荐系统通过查询世界模型来捕捉用户偏好进行训练,但从历史日志数据中学习世界模型容易受到流行度偏差和采样偏差等偏差问题的影响。这也是近期提出了一些去偏方法的原因。然而,仍存在两个关键缺陷:1) 忽略时变流行度的动态性导致对物品的错误重新加权;2) 将负采样中的未知样本视为负样本导致采样偏差。为克服这两个缺陷,我们提出了一种名为\textbf{可识别去偏基于模型的交互式推荐}(简称\textbf{iDMIR})的模型。在iDMIR中,针对第一个缺陷,我们基于具有可识别保证的时变推荐生成过程的因果机制,设计了一个去偏的因果世界模型;针对第二个缺陷,我们设计了一种去偏的对比策略,该策略与去偏对比学习一致并避免了采样偏差。此外,我们证明所提出的方法不仅优于几种最新的交互式推荐算法,还具备多样化的推荐性能。