Estimating position bias is a well-known challenge in Learning to Rank (L2R). Click data in e-commerce applications, such as targeted advertisements and search engines, provides implicit but abundant feedback to improve personalized rankings. However, click data inherently includes various biases like position bias. Based on the position-based click model, Result Randomization and Regression Expectation-Maximization algorithm (REM) have been proposed to estimate position bias, but they require various paired observations of (item, position). In real-world scenarios of advertising, marketers frequently display advertisements in a fixed pre-determined order, which creates difficulties in estimation due to the limited availability of various pairs in the training data, resulting in a sparse dataset. We propose a variant of the REM that utilizes item embeddings to alleviate the sparsity of (item, position). Using a public dataset and internal carousel advertisement click dataset, we empirically show that item embedding with Latent Semantic Indexing (LSI) and Variational Auto-Encoder (VAE) improves the accuracy of position bias estimation and the estimated position bias enhances Learning to Rank performance. We also show that LSI is more effective as an embedding creation method for position bias estimation.
翻译:在排序学习(L2R)中,估计位置偏差是一个众所周知的挑战。电子商务应用(如定向广告和搜索引擎)中的点击数据提供了隐式但丰富的反馈,以改进个性化排序。然而,点击数据固有地包含各种偏差,例如位置偏差。基于位置点击模型,已有研究提出结果随机化与回归期望最大化算法(REM)来估计位置偏差,但该方法需要大量(物品,位置)配对观察数据。在现实广告场景中,广告主常以固定预设顺序展示广告,这导致训练数据中可用配对数量有限,形成稀疏数据集,从而给估计带来困难。我们提出REM算法的一种变体,利用物品嵌入来缓解(物品,位置)配对的稀疏性问题。通过使用公开数据集和内部轮播广告点击数据集,我们实验表明:采用潜在语义索引(LSI)和变分自编码器(VAE)的物品嵌入方法提高了位置偏差估计的准确性,且估计所得的位置偏差增强了排序学习性能。我们还证明,LSI作为位置偏差估计的嵌入生成方法更为有效。