RUEL: Retrieval-Augmented User Representation with Edge Browser Logs for Sequential Recommendation

Online recommender systems (RS) aim to match user needs with the vast amount of resources available on various platforms. A key challenge is to model user preferences accurately under the condition of data sparsity. To address this challenge, some methods have leveraged external user behavior data from multiple platforms to enrich user representation. However, all of these methods require a consistent user ID across platforms and ignore the information from similar users. In this study, we propose RUEL, a novel retrieval-based sequential recommender that can effectively incorporate external anonymous user behavior data from Edge browser logs to enhance recommendation. We first collect and preprocess a large volume of Edge browser logs over a one-year period and link them to target entities that correspond to candidate items in recommendation datasets. We then design a contrastive learning framework with a momentum encoder and a memory bank to retrieve the most relevant and diverse browsing sequences from the full browsing log based on the semantic similarity between user representations. After retrieval, we apply an item-level attentive selector to filter out noisy items and generate refined sequence embeddings for the final predictor. RUEL is the first method that connects user browsing data with typical recommendation datasets and can be generalized to various recommendation scenarios and datasets. We conduct extensive experiments on four real datasets for sequential recommendation tasks and demonstrate that RUEL significantly outperforms state-of-the-art baselines. We also conduct ablation studies and qualitative analysis to validate the effectiveness of each component of RUEL and provide additional insights into our method.

翻译：在线推荐系统旨在将用户需求与各平台海量资源进行匹配。其核心挑战在于数据稀疏条件下准确建模用户偏好。为解决该问题，现有方法通过整合多平台外部用户行为数据来丰富用户表征，但这些方法均需跨平台统一用户标识，且忽略了相似用户的信息。本文提出RUEL——一种新型基于检索的序列推荐模型，可有效整合Edge浏览器日志中的外部匿名用户行为数据以增强推荐。我们首先收集并预处理一年期内的大规模Edge浏览器日志，将其链接至推荐数据集中候选物品对应的目标实体；进而设计基于动量编码器与存储库的对比学习框架，依据用户表征间的语义相似度从完整浏览日志中检索最相关且多样化的浏览序列。检索后，采用逐物品注意力选择器过滤噪声项，生成精炼序列嵌入以用于最终预测。RUEL是首个将用户浏览数据与典型推荐数据集相连接的方法，可泛化至多种推荐场景与数据集。我们基于四个真实数据集开展序列推荐任务的大量实验，证明RUEL显著优于现有最优基线方法。通过消融实验与定性分析，我们验证了RUEL各组件的有效性，并提供了对该方法的深入见解。