E-commerce campaign ranking models require large-scale training labels indicating which users purchased due to campaign influence. However, generating these labels is challenging because campaigns use creative, thematic language that does not directly map to product purchases. Without clear product-level attribution, supervised learning for campaign optimization remains limited. We present \textbf{Campaign-2-PT-RAG}, a scalable label generation framework that constructs user--campaign purchase labels by inferring which product types (PTs) each campaign promotes. The framework first interprets campaign content using large language models (LLMs) to capture implicit intent, then retrieves candidate PTs through semantic search over the platform taxonomy. A structured LLM-based classifier evaluates each PT's relevance, producing a campaign-specific product coverage set. User purchases matching these PTs generate positive training labels for downstream ranking models. This approach reframes the ambiguous attribution problem into a tractable semantic alignment task, enabling scalable and consistent supervision for downstream tasks such as campaign ranking optimization in production e-commerce environments. Experiments on internal and synthetic datasets, validated against expert-annotated campaign--PT mappings, show that our LLM-assisted approach generates high-quality labels with 78--90% precision while maintaining over 99% recall.
翻译:电子商务营销活动排序模型需要大规模的训练标签,用以指示哪些用户因活动影响而完成购买。然而,生成这些标签具有挑战性,因为营销活动使用富有创意和主题性的语言,这些语言并不直接映射到具体的产品购买。由于缺乏清晰的产品级归因,用于活动优化的监督学习仍然受限。我们提出了 \textbf{Campaign-2-PT-RAG},一个可扩展的标签生成框架,它通过推断每个营销活动所推广的产品类型来构建用户-活动购买标签。该框架首先利用大语言模型解读活动内容以捕捉隐含意图,然后通过平台分类体系上的语义搜索来检索候选产品类型。一个基于大语言模型的结构化分类器评估每个产品类型的相关性,从而生成一个特定于营销活动的产品覆盖集合。与这些产品类型匹配的用户购买行为即为下游排序模型生成正面的训练标签。此方法将模糊的归因问题重构为一个可处理的语义对齐任务,从而为下游任务(例如生产环境中的电子商务营销活动排序优化)提供可扩展且一致的监督。在内部和合成数据集上进行的实验,以专家标注的活动-产品类型映射为验证标准,结果表明我们的大语言模型辅助方法能以 78–90% 的精确度生成高质量标签,同时保持超过 99% 的召回率。