Item-to-Item (I2I) recommendation models are widely used in real-world systems due to their scalability, real-time capabilities, and high recommendation quality. Research to enhance I2I performance focuses on two directions: 1) model-centric approaches, which adopt deeper architectures but risk increased computational costs and deployment complexity, and 2) data-centric methods, which refine training data without altering models, offering cost-effectiveness but struggling with data sparsity and noise. To address these challenges, we propose LLM-I2I, a data-centric framework leveraging Large Language Models (LLMs) to mitigate data quality issues. LLM-I2I includes (1) an LLM-based generator that synthesizes user-item interactions for long-tail items, alleviating data sparsity, and (2) an LLM-based discriminator that filters noisy interactions from real and synthetic data. The refined data is then fused to train I2I models. Evaluated on industry (AEDS) and academic (ARD) datasets, LLM-I2I consistently improves recommendation accuracy, particularly for long-tail items. Deployed on a large-scale cross-border e-commerce platform, it boosts recall number (RN) by 6.02% and gross merchandise value (GMV) by 1.22% over existing I2I models. This work highlights the potential of LLMs in enhancing data-centric recommendation systems without modifying model architectures.
翻译:物品间(I2I)推荐模型因其可扩展性、实时性和高推荐质量而在实际系统中得到广泛应用。提升I2I性能的研究主要聚焦于两个方向:1)以模型为中心的方法,采用更深的架构但可能增加计算成本和部署复杂度;2)以数据为中心的方法,在不改变模型的前提下优化训练数据,具有成本效益但面临数据稀疏性和噪声的挑战。为解决这些问题,我们提出了LLM-I2I,这是一个利用大型语言模型(LLMs)来缓解数据质量问题的以数据为中心的框架。LLM-I2I包含(1)一个基于LLM的生成器,用于为长尾物品合成用户-物品交互,以缓解数据稀疏性;以及(2)一个基于LLM的判别器,用于从真实和合成数据中过滤噪声交互。随后,精炼后的数据被融合用于训练I2I模型。在工业(AEDS)和学术(ARD)数据集上的评估表明,LLM-I2I持续提升了推荐准确性,特别是对于长尾物品。在一个大规模跨境电商平台上的部署显示,相较于现有I2I模型,其召回数(RN)提升了6.02%,商品交易总额(GMV)提升了1.22%。这项工作凸显了LLMs在不修改模型架构的情况下增强以数据为中心的推荐系统的潜力。