Bridging Sequential and Contextual Features with a Dual-View of Fine-grained Core-Behaviors and Global Interest-Distribution

Click-through rate (CTR) prediction tasks typically estimate the probability of a user clicking on a candidate item by modeling both user behavior sequence features and the item's contextual features, where the user behavior sequence is particularly critical as it dynamically reflects real-time shifts in user interest. Traditional CTR models often aggregate this dynamic sequence into a single vector before interacting it with contextual features. This approach, however, not only leads to behavior information loss during aggregation but also severely limits the model's capacity to capture interactions between contextual features and specific user behaviors, ultimately impairing its ability to capture fine-grained behavioral details and hindering models' prediction accuracy. Conversely, a naive approach of directly interacting with each user action with contextual features is computationally expensive and introduces significant noise from behaviors irrelevant to the candidate item. This noise tends to overwhelm the valuable signals arising from interactions involving more behaviors relevant to the candidate item. Therefore, to resolve the above issue, we propose a Core-Behaviors and Distributional-Compensation Dual-View Interaction Network (CDNet), which bridges the gap between sequential and contextual feature interactions from two complementary angles: a fine-grained interaction involving the most relevant behaviors and contextual features, and a coarse-grained interaction that models the user's overall interest distribution against the contextual features. By simultaneously capturing important behavioral details without forgoing the holistic user interest, CDNet effectively models the interplay between sequential and contextual features without imposing a significant computational burden. Ultimately, extensive experiments validate the effectiveness of CDNet.

翻译：点击率（CTR）预测任务通常通过建模用户行为序列特征与物品上下文特征来估计用户点击候选物品的概率，其中用户行为序列尤为关键，因为它动态反映了用户兴趣的实时变化。传统的CTR模型通常将这一动态序列聚合为单个向量，再与上下文特征进行交互。然而，这种方法不仅导致聚合过程中的行为信息丢失，还严重限制了模型捕获上下文特征与特定用户行为之间交互的能力，最终削弱了其捕捉细粒度行为细节的能力，并阻碍了模型的预测精度。相反，若采用直接让每个用户行为与上下文特征交互的朴素方法，则计算成本高昂，且会引入大量与候选物品无关的行为噪声。这种噪声往往会淹没来自与候选物品更相关行为交互所产生的有价值信号。因此，为解决上述问题，我们提出了一种核心行为与分布补偿双视角交互网络（CDNet），该网络从两个互补的角度桥接了序列特征与上下文特征交互之间的鸿沟：一是涉及最相关行为与上下文特征的细粒度交互，二是对用户整体兴趣分布与上下文特征进行建模的粗粒度交互。通过在不放弃整体用户兴趣的同时捕获重要的行为细节，CDNet有效地建模了序列特征与上下文特征之间的相互作用，且未施加显著的计算负担。最终，大量实验验证了CDNet的有效性。