Eliminating examination bias accurately is pivotal to apply click-through data to train an unbiased ranking model. However, most examination-bias estimators are limited to the hypothesis of Position-Based Model (PBM), which supposes that the calculation of examination bias only depends on the rank of the document. Recently, although some works introduce information such as clicks in the same query list and contextual information when calculating the examination bias, they still do not model the impact of document representation on search engine result pages (SERPs) that seriously affects one's perception of document relevance to a query when examining. Therefore, we propose a Multi-Feature Integration Model (MFIM) where the examination bias depends on the representation of document except the rank of it. Furthermore, we mine a key factor slipoff counts that can indirectly reflects the influence of all perception-bias factors. Real world experiments on Baidu-ULTR dataset demonstrate the superior effectiveness and robustness of the new approach. The source code is available at \href{https://github.com/lixsh6/Tencent_wsdm_cup2023/tree/main/pytorch_unbias}{https://github.com/lixsh6/Tencent\_wsdm\_cup2023}
翻译:准确消除检验偏差是利用点击数据训练无偏排序模型的关键。然而,大多数检验偏差估计方法受限于基于位置模型(PBM)的假设,该假设认为检验偏差的计算仅依赖于文档的排序位置。尽管近期有研究在计算检验偏差时引入了同一查询列表中的点击信息及上下文特征,但仍未建模文档表示对搜索引擎结果页面(SERP)的影响——而这一影响在用户检验过程中严重改变其对文档与查询相关性的感知。为此,我们提出一种多特征融合模型(MFIM),其中检验偏差不仅取决于文档排序位置,更依赖于文档表示。此外,我们挖掘了可间接反映所有感知偏差因素影响的关键因子——滑离次数。在百度-ULTR数据集上的实验表明,该新方法具有优越的有效性和鲁棒性。源代码已开源至 \href{https://github.com/lixsh6/Tencent_wsdm_cup2023/tree/main/pytorch_unbias}{https://github.com/lixsh6/Tencent\_wsdm\_cup2023}。