PI2I: A Personalized Item-Based Collaborative Filtering Retrieval Framework

Efficiently selecting relevant content from vast candidate pools is a critical challenge in modern recommender systems. Traditional methods, such as item-to-item collaborative filtering (CF) and two-tower models, often fall short in capturing the complex user-item interactions due to uniform truncation strategies and overdue user-item crossing. To address these limitations, we propose Personalized Item-to-Item (PI2I), a novel two-stage retrieval framework that enhances the personalization capabilities of CF. In the first Indexer Building Stage (IBS), we optimize the retrieval pool by relaxing truncation thresholds to maximize Hit Rate, thereby temporarily retaining more items users might be interested in. In the second Personalized Retrieval Stage (PRS), we introduce an interactive scoring model to overcome the limitations of inner product calculations, allowing for richer modeling of intricate user-item interactions. Additionally, we construct negative samples based on the trigger-target (item-to-item) relationship, ensuring consistency between offline training and online inference. Offline experiments on large-scale real-world datasets demonstrate that PI2I outperforms traditional CF methods and rivals Two-Tower models. Deployed in the "Guess You Like" section on Taobao, PI2I achieved a 1.05% increase in online transaction rates. In addition, we have released a large-scale recommendation dataset collected from Taobao, containing 130 million real-world user interactions used in the experiments of this paper. The dataset is publicly available at https://huggingface.co/datasets/PI2I/PI2I, which could serve as a valuable benchmark for the research community.

翻译：在现代推荐系统中，如何从海量候选池中高效筛选相关内容是一个关键挑战。传统方法，如物品到物品协同过滤（CF）和双塔模型，由于采用统一的截断策略以及用户-物品交互建模的滞后性，往往难以捕捉复杂的用户-物品交互关系。为应对这些局限，我们提出了个性化物品到物品（PI2I）检索框架，这是一种新颖的两阶段检索框架，旨在增强协同过滤的个性化能力。在第一阶段索引构建阶段（IBS），我们通过放宽截断阈值来优化检索池，以最大化命中率，从而暂时保留更多用户可能感兴趣的物品。在第二阶段个性化检索阶段（PRS），我们引入了一个交互式评分模型来克服内积计算的局限性，从而能够对复杂的用户-物品交互进行更丰富的建模。此外，我们基于触发-目标（物品到物品）关系构建负样本，确保了离线训练与在线推理之间的一致性。在大规模真实世界数据集上的离线实验表明，PI2I 的性能优于传统 CF 方法，并可媲美双塔模型。在淘宝“猜你喜欢”板块部署后，PI2I 实现了在线交易率 1.05% 的提升。此外，我们发布了一个从淘宝收集的大规模推荐数据集，其中包含本文实验所用的 1.3 亿条真实用户交互记录。该数据集已在 https://huggingface.co/datasets/PI2I/PI2I 公开，可作为研究社区的一个有价值的基准。