Unbiased learning to rank (ULTR) studies the problem of mitigating various biases from implicit user feedback data such as clicks, and has been receiving considerable attention recently. A popular ULTR approach for real-world applications uses a two-tower architecture, where click modeling is factorized into a relevance tower with regular input features, and a bias tower with bias-relevant inputs such as the position of a document. A successful factorization will allow the relevance tower to be exempt from biases. In this work, we identify a critical issue that existing ULTR methods ignored - the bias tower can be confounded with the relevance tower via the underlying true relevance. In particular, the positions were determined by the logging policy, i.e., the previous production model, which would possess relevance information. We give both theoretical analysis and empirical results to show the negative effects on relevance tower due to such a correlation. We then propose three methods to mitigate the negative confounding effects by better disentangling relevance and bias. Empirical results on both controlled public datasets and a large-scale industry dataset show the effectiveness of the proposed approaches.
翻译:无偏学习排序(ULTR)研究从用户隐式反馈数据(如点击)中缓解各类偏差的问题,近来受到广泛关注。一种面向实际应用的流行ULTR方法采用双塔架构,将点击建模分解为具有常规输入特征的相关性塔和具有偏差相关输入(如文档位置)的偏差塔。成功的分解能使相关性塔免于偏差影响。本研究发现现有ULTR方法忽视的一个关键问题:偏差塔与相关性塔可能通过潜在的真实相关性产生混淆。具体而言,位置由日志策略(即之前的线上模型)决定,而该策略本身包含相关性信息。我们通过理论分析和实证结果揭示了这种相关性对相关性塔的负面影响,进而提出三种通过更好解耦相关性与偏差来缓解负向混淆效应的方法。在受控公开数据集和大规模工业数据集上的实验结果表明了所提方法的有效性。