E-commerce campaign ranking models require large-scale training labels indicating which users purchased due to campaign influence. However, generating these labels is challenging because campaigns use creative, thematic language that does not directly map to product purchases. Without clear product-level attribution, supervised learning for campaign optimization remains limited. We present Campaign-2-PT-RAG, a scalable label generation framework that constructs user-campaign purchase labels by inferring which product types (PTs) each campaign promotes. The framework first interprets campaign content using large language models (LLMs) to capture implicit intent, then retrieves candidate PTs through semantic search over the platform taxonomy. A structured LLM-based classifier evaluates each PT's relevance, producing a campaign-specific product coverage set. User purchases matching these PTs generate positive training labels for downstream ranking models. This approach reframes the ambiguous attribution problem into a tractable semantic alignment task, enabling scalable and consistent supervision for downstream tasks such as campaign ranking optimization in production e-commerce environments. Experiments on internal and synthetic datasets, validated against expert-annotated campaign-PT mappings, show that our LLM-assisted approach generates high-quality labels with 78-90% precision while maintaining over 99% recall.
翻译:电子商务营销活动排序模型需要大规模的训练标签,用以指示哪些用户是因营销活动影响而产生的购买行为。然而,生成这些标签具有挑战性,因为营销活动通常使用富有创意和主题性的语言,这些语言并不直接映射到具体的产品购买。由于缺乏清晰的产品级归因,用于营销活动优化的监督学习仍然受限。我们提出了Campaign-2-PT-RAG,一个可扩展的标签生成框架,它通过推断每个营销活动所推广的产品类型(PT)来构建用户-营销活动购买标签。该框架首先利用大语言模型(LLMs)解读营销活动内容以捕捉其隐含意图,然后通过语义搜索平台分类体系来检索候选产品类型。一个基于大语言模型的结构化分类器评估每个产品类型的相关性,从而生成一个针对特定营销活动的产品覆盖集合。与这些产品类型匹配的用户购买行为,即为下游排序模型生成正向训练标签。此方法将模糊的归因问题重构为一个可处理的语义对齐任务,从而为生产环境中的电子商务营销活动排序优化等下游任务提供了可扩展且一致的监督信号。在内部数据集和合成数据集上的实验,以及对照专家标注的营销活动-产品类型映射进行的验证表明,我们的大语言模型辅助方法能够生成高质量的标签,其精确度达到78-90%,同时保持超过99%的召回率。