Extracting users' interests from their lifelong behavior sequence is crucial for predicting Click-Through Rate (CTR). Most current methods employ a two-stage process for efficiency: they first select historical behaviors related to the candidate item and then deduce the user's interest from this narrowed-down behavior sub-sequence. This two-stage paradigm, though effective, leads to information loss. Solely using users' lifelong click behaviors doesn't provide a complete picture of their interests, leading to suboptimal performance. In our research, we introduce the Deep Group Interest Network (DGIN), an end-to-end method to model the user's entire behavior history. This includes all post-registration actions, such as clicks, cart additions, purchases, and more, providing a nuanced user understanding. We start by grouping the full range of behaviors using a relevant key (like item_id) to enhance efficiency. This process reduces the behavior length significantly, from O(10^4) to O(10^2). To mitigate the potential loss of information due to grouping, we incorporate two categories of group attributes. Within each group, we calculate statistical information on various heterogeneous behaviors (like behavior counts) and employ self-attention mechanisms to highlight unique behavior characteristics (like behavior type). Based on this reorganized behavior data, the user's interests are derived using the Transformer technique. Additionally, we identify a subset of behaviors that share the same item_id with the candidate item from the lifelong behavior sequence. The insights from this subset reveal the user's decision-making process related to the candidate item, improving prediction accuracy. Our comprehensive evaluation, both on industrial and public datasets, validates DGIN's efficacy and efficiency.
翻译:从用户生命周期行为序列中提取兴趣对点击率(CTR)预测至关重要。当前多数方法采用两阶段流程以提高效率:首先筛选与候选物品相关的历史行为,然后从缩减后的行为子序列中推断用户兴趣。这种两阶段范式虽有效,但会导致信息损失。仅使用用户生命周期内的点击行为无法完整刻画其兴趣,从而影响模型性能。本研究提出深度群体兴趣网络(DGIN),一种端到端建模用户全量行为历史的方法,涵盖注册后所有操作(如点击、加购、购买等),实现细粒度用户理解。我们首先利用相关键(如item_id)对全量行为进行分组以提高效率,将行为长度从O(10^4)量级显著压缩至O(10^2)量级。为缓解分组可能造成的信息损失,我们引入两类群体属性:在每个分组内,计算异构行为的统计信息(如行为次数),并采用自注意力机制突出独特行为特征(如行为类型)。基于重组后的行为数据,利用Transformer技术推导用户兴趣。此外,我们从生命周期行为序列中识别出与候选物品具有相同item_id的行为子集,该子集蕴含的用户决策过程信息可提升预测精度。在工业数据集和公开数据集上的综合评估验证了DGIN的有效性与高效性。