Large Language Models (LLMs) have shown strong capabilities in Natural Language Understanding and Generation, but deploying them directly in online advertising systems is often impractical due to strict millisecond-level latency constraints. This has motivated the use of LLMs offline to improve retrieval, ranking, and recommendation models. Existing solutions typically fine-tune separate LLMs for individual tasks such as query-ad relevance labeling, keyword-based query generation, and user profiling. This results in redundant models, high maintenance cost, and limited performance gains despite substantial overlap in domain knowledge and reasoning patterns. We introduce AdNanny, a unified reasoning-centric LLM that serves as a shared backbone for offline advertising tasks. AdNanny is obtained by fine-tuning a public 671B-parameter DeepSeek-R1 checkpoint using a scalable training system that supports hybrid dense-MoE parallelism. We construct reasoning-augmented corpora that pair structured supervision with step-by-step natural language explanations. A multi-task supervised fine-tuning stage with adaptive reweighting enables AdNanny to handle diverse labeling and generation tasks in a consistent reasoning format. This is followed by reinforcement learning using downstream advertising metrics to align model behavior with online retrieval and ranking objectives. AdNanny is deployed in production within Bing Ads, where it significantly reduces manual labeling effort and improves accuracy across multiple offline tasks. By consolidating many task-specific models into a single reasoning-centric foundation model, AdNanny provides a scalable and cost-effective solution for large-scale advertising systems.
翻译:大语言模型(LLMs)在自然语言理解与生成方面展现出强大能力,但由于严格的毫秒级延迟限制,直接将其部署于在线广告系统通常不切实际。这促使了利用LLMs离线改进检索、排序与推荐模型的研究。现有方案通常针对查询-广告相关性标注、基于关键词的查询生成和用户画像等独立任务分别微调不同的LLMs,导致模型冗余、维护成本高昂,且尽管领域知识与推理模式存在大量重叠,性能提升却有限。我们提出了AdNanny——一个以推理为中心的统一大语言模型,作为离线广告任务的共享主干。AdNanny通过对公开的6710亿参数DeepSeek-R1检查点进行微调获得,采用了支持混合稠密-MoE并行的可扩展训练系统。我们构建了推理增强语料库,将结构化监督与逐步自然语言解释相结合。通过采用自适应重加权的多任务监督微调阶段,AdNanny能够以一致的推理格式处理多样化的标注与生成任务。随后使用下游广告指标进行强化学习,使模型行为与在线检索和排序目标对齐。AdNanny已在必应广告系统中投入生产部署,显著减少了人工标注工作量,并在多个离线任务中提升了准确性。通过将众多任务专用模型整合为单一以推理为中心的基础模型,AdNanny为大规模广告系统提供了可扩展且经济高效的解决方案。