Large-scale recommendation systems operate across diverse domains, yet they face the challenges of data sparsity and noisy implicit feedback. Traditional approaches mitigate this via model-specific knowledge distillation from source domains to a target domain. Inspired by the transformative success of synthetic data generation in large language models (LLMs), we introduce Synthetic Cross-domain Augmentation and Learning for Recommendation (SCALR), a framework that generates synthetic user-item interaction events for a target recommendation domain by leveraging observed events from a source domain. SCALR decomposes cross-domain learning into two modular stages. First, it translates observed user events in source domains by framing event generation as estimating the likelihood that a user would interact with a target-domain item, conditioned on their observed interactions in a source domain. Second, downstream models train on these synthetic events as cross-domain learning objectives, where the synthetic events augment the target domain's training data in a model-agnostic manner. Our approach yields statistically significant improvements in online A/B tests on an industrial recommendation platform. To the best of our knowledge, this is among the first works to explicitly frame cross-domain event transfer as synthetic data generation for recommendation systems.
翻译:大规模推荐系统在多个领域运行,但面临数据稀疏性和隐式反馈噪声的挑战。传统方法通过从源领域向目标领域进行模型特定的知识蒸馏来缓解这一问题。受合成数据生成在大语言模型(LLMs)中变革性成功的启发,我们提出了面向推荐的合成跨域增强学习(SCALR)框架,该框架通过利用源域观测事件为目标推荐域生成合成用户-物品交互事件。SCALR将跨域学习分解为两个模块化阶段。首先,它将源域中的观测用户事件转化为事件生成问题,即基于用户在源域中的观测交互,估计其与目标域物品交互的可能性。其次,下游模型将这些合成事件作为跨域学习目标进行训练,其中合成事件以模型无关的方式增强目标域的训练数据。我们的方法在工业推荐平台的在线A/B测试中取得了统计显著的改进。据我们所知,这是首批明确将跨域事件迁移作为推荐系统合成数据生成的工作之一。