Debiased recommendation with a randomized dataset has shown very promising results in mitigating the system-induced biases. However, it still lacks more theoretical insights or an ideal optimization objective function compared with the other more well studied route without a randomized dataset. To bridge this gap, we study the debiasing problem from a new perspective and propose to directly minimize the upper bound of an ideal objective function, which facilitates a better potential solution to the system-induced biases. Firstly, we formulate a new ideal optimization objective function with a randomized dataset. Secondly, according to the prior constraints that an adopted loss function may satisfy, we derive two different upper bounds of the objective function, i.e., a generalization error bound with the triangle inequality and a generalization error bound with the separability. Thirdly, we show that most existing related methods can be regarded as the insufficient optimization of these two upper bounds. Fourthly, we propose a novel method called debiasing approximate upper bound with a randomized dataset (DUB), which achieves a more sufficient optimization of these upper bounds. Finally, we conduct extensive experiments on a public dataset and a real product dataset to verify the effectiveness of our DUB.
翻译:基于随机数据集的去偏推荐在缓解系统诱导偏差方面已展现出显著效果。然而,与更成熟的无随机数据集研究路径相比,该方法仍缺乏理论深度与理想的优化目标函数。为弥合这一差距,我们从新视角研究去偏问题,提出直接最小化理想目标函数上界的策略,从而为系统诱导偏差提供更优解决方案。首先,我们基于随机数据集构建了新的理想优化目标函数。其次,根据所采用损失函数可能满足的先验约束条件,推导出目标函数的两种不同上界——基于三角不等式的泛化误差界与基于可分性的泛化误差界。再次,我们发现现有相关方法大多可视为这两种上界的欠优化实现。第四,我们提出基于随机数据集的去偏近似上界方法(DUB),实现了对这两种上界的更充分优化。最后,在公开数据集和真实产品数据集上进行大量实验,验证了DUB方法的有效性。