Click-through rate (CTR) prediction is a critical task in recommendation systems, serving as the ultimate filtering step to sort items for a user. Most recent cutting-edge methods primarily focus on investigating complex implicit and explicit feature interactions; however, these methods neglect the spurious correlation issue caused by confounding factors, thereby diminishing the model's generalization ability. We propose a CTR prediction framework that REmoves Spurious cORrelations in mulTilevel feature interactions, termed RE-SORT, which has two key components. I. A multilevel stacked recurrent (MSR) structure enables the model to efficiently capture diverse nonlinear interactions from feature spaces at different levels. II. A spurious correlation elimination (SCE) module further leverages Laplacian kernel mapping and sample reweighting methods to eliminate the spurious correlations concealed within the multilevel features, allowing the model to focus on the true causal features. Extensive experiments conducted on four challenging CTR datasets, our production dataset, and an online A/B test demonstrate that the proposed method achieves state-of-the-art performance in both accuracy and speed. The utilized codes, models and dataset will be released at https://github.com/RE-SORT.
翻译:点击率(CTR)预测是推荐系统中的关键任务,作为对用户物品进行排序的最终筛选步骤。当前最前沿方法主要聚焦于研究复杂的隐式和显式特征交互,但这些方法忽视了由混杂因素引发的虚假相关性问题,从而削弱了模型的泛化能力。我们提出了一种消除多层级特征交互中虚假相关性的CTR预测框架,即RE-SORT,该框架包含两个核心组件:一、多层级堆叠循环(MSR)结构使模型能够高效地从不同层级的特征空间中捕获多样化的非线性交互;二、虚假相关性消除(SCE)模块进一步利用拉普拉斯核映射与样本重加权方法,消除潜藏于多层级特征中的虚假相关性,使模型聚焦于真实的因果特征。在四个具有挑战性的公开CTR数据集、我们的生产数据集以及在线A/B测试中进行的广泛实验表明,所提方法在准确性与速度上均达到了当前最优性能。相关代码、模型及数据集将在 https://github.com/RE-SORT 发布。