The modeling of users' behaviors is crucial in modern recommendation systems. A lot of research focuses on modeling users' lifelong sequences, which can be extremely long and sometimes exceed thousands of items. These models use the target item to search for the most relevant items from the historical sequence. However, training lifelong sequences in click through rate (CTR) prediction or personalized search ranking (PSR) is extremely difficult due to the insufficient learning problem of ID embedding, especially when the IDs in the lifelong sequence features do not exist in the samples of training dataset. Additionally, existing target attention mechanisms struggle to learn the multi-modal representations of items in the sequence well. The distribution of multi-modal embedding (text, image and attributes) output of user's interacted items are not properly aligned and there exist divergence across modalities. We also observe that users' search query sequences and item browsing sequences can fully depict users' intents and benefit from each other. To address these challenges, we propose a unified lifelong multi-modal sequence model called SEMINAR-Search Enhanced Multi-Modal Interest Network and Approximate Retrieval. Specifically, a network called Pretraining Search Unit (PSU) learns the lifelong sequences of multi-modal query-item pairs in a pretraining-finetuning manner with multiple objectives: multi-modal alignment, next query-item pair prediction, query-item relevance prediction, etc. After pretraining, the downstream model restores the pretrained embedding as initialization and finetunes the network. To accelerate the online retrieval speed of multi-modal embedding, we propose a multi-modal codebook-based product quantization strategy to approximate the exact attention calculati
翻译:用户行为建模在现代推荐系统中至关重要。大量研究聚焦于对用户终身序列进行建模,这些序列可能极其漫长,有时甚至包含数千个项目。现有模型通常利用目标项目从历史序列中搜索最相关的项目。然而,在点击率预测或个性化搜索排序任务中训练终身序列极为困难,这主要源于ID嵌入的学习不足问题——特别是当终身序列特征中的ID在训练数据集的样本中不存在时。此外,现有的目标注意力机制难以充分学习序列中项目的多模态表示。用户交互项目的多模态嵌入输出在文本、图像及属性等模态间的分布未能恰当对齐,且存在跨模态差异。我们还观察到,用户的搜索查询序列与项目浏览序列能够完整刻画用户意图,且二者可相互增益。为应对这些挑战,我们提出一个统一的终身多模态序列模型——SEMINAR(搜索增强多模态兴趣网络与近似检索)。具体而言,我们设计了一个称为预训练搜索单元的模块,通过预训练-微调范式,以多模态对齐、下一查询-项目对预测、查询-项目相关性预测等多重目标,学习多模态查询-项目对的终身序列。预训练完成后,下游模型将预训练嵌入作为初始化参数恢复网络并进行微调。为加速多模态嵌入的在线检索速度,我们提出一种基于多模态码本的产品量化策略,以近似精确注意力计算。