Survivor bias in observational data leads the optimization of recommender systems towards local optima. Currently most solutions re-mines existing human-system collaboration patterns to maximize longer-term satisfaction by reinforcement learning. However, from the causal perspective, mitigating survivor effects requires answering a counterfactual problem, which is generally unidentifiable and inestimable. In this work, we propose a neural causal model to achieve counterfactual inference. Specifically, we first build a learnable structural causal model based on its available graphical representations which qualitatively characterizes the preference transitions. Mitigation of the survivor bias is achieved though counterfactual consistency. To identify the consistency, we use the Gumbel-max function as structural constrains. To estimate the consistency, we apply reinforcement optimizations, and use Gumbel-Softmax as a trade-off to get a differentiable function. Both theoretical and empirical studies demonstrate the effectiveness of our solution.
翻译:观测数据中的幸存者偏差导致推荐系统优化陷入局部最优。目前大多数解决方案通过重新挖掘已有的人机协作模式,利用强化学习最大化长期满意度。然而,从因果视角来看,缓解幸存者效应需要回答一个反事实问题,这通常是不可识别且不可估计的。本文提出一种神经因果模型以实现反事实推理。具体而言,我们首先基于其可用的图形表示构建一个可学习的结构因果模型,该模型定性刻画了偏好转移过程。通过反事实一致性实现幸存者偏差的缓解。为识别该一致性,我们采用Gumbel-max函数作为结构约束;为估计该一致性,我们应用强化优化,并利用Gumbel-Softmax作为可微函数的折中方案。理论与实证研究均证明了本方案的有效性。