TWIN: TWo-stage Interest Network for Lifelong User Behavior Modeling in CTR Prediction at Kuaishou

Life-long user behavior modeling, i.e., extracting a user's hidden interests from rich historical behaviors in months or even years, plays a central role in modern CTR prediction systems. Conventional algorithms mostly follow two cascading stages: a simple General Search Unit (GSU) for fast and coarse search over tens of thousands of long-term behaviors and an Exact Search Unit (ESU) for effective Target Attention (TA) over the small number of finalists from GSU. Although efficient, existing algorithms mostly suffer from a crucial limitation: the \textit{inconsistent} target-behavior relevance metrics between GSU and ESU. As a result, their GSU usually misses highly relevant behaviors but retrieves ones considered irrelevant by ESU. In such case, the TA in ESU, no matter how attention is allocated, mostly deviates from the real user interests and thus degrades the overall CTR prediction accuracy. To address such inconsistency, we propose \textbf{TWo-stage Interest Network (TWIN)}, where our Consistency-Preserved GSU (CP-GSU) adopts the identical target-behavior relevance metric as the TA in ESU, making the two stages twins. Specifically, to break TA's computational bottleneck and extend it from ESU to GSU, or namely from behavior length $10^2$ to length $10^4-10^5$, we build a novel attention mechanism by behavior feature splitting. For the video inherent features of a behavior, we calculate their linear projection by efficient pre-computing \& caching strategies. And for the user-item cross features, we compress each into a one-dimentional bias term in the attention score calculation to save the computational cost. The consistency between two stages, together with the effective TA-based relevance metric in CP-GSU, contributes to significant performance gain in CTR prediction.

翻译：终身用户行为建模，即从长达数月甚至数年的丰富历史行为中提取用户的隐藏兴趣，在现代点击率（CTR）预测系统中扮演着核心角色。传统算法大多遵循两阶段级联结构：一个简单的通用搜索单元（GSU）用于对海量长期行为进行快速粗略筛选，以及一个精确搜索单元（ESU）用于对GSU遴选出的少量候选行为进行有效目标注意力（TA）计算。尽管此类算法高效，但它们普遍存在一个关键缺陷：GSU与ESU之间采用不一致的目标-行为相关性度量标准。这导致GSU常遗漏高度相关的行为，却检索出被ESU判定为无关的行为。在此情况下，无论注意力如何分配，ESU中的TA都会偏离用户真实兴趣，进而降低整体CTR预测精度。为解决这一不一致性问题，我们提出**两阶段兴趣网络（TWIN）**，其中一致性保持GSU（CP-GSU）采用与ESU中TA完全相同的目标-行为相关性度量标准，使两个阶段成为“孪生结构”。具体而言，为突破TA的计算瓶颈并使其从ESU（行为长度量级为$10^2$）扩展到GSU（行为长度量级$10^4$-$10^5$），我们通过行为特征拆分构建了一种新型注意力机制。对于行为中的视频固有特征，我们采用高效的预计算与缓存策略计算其线性投影；而对于用户-物品交叉特征，我们将其压缩为注意力分数计算中的一维偏置项以节省计算开销。两阶段间的一致性，结合CP-GSU中基于TA的有效相关性度量，显著提升了CTR预测性能。