LLM Retrieval for Stable and Predictable Ad Recommendations

Vinodh Kumar Sunkara,Satheeshkumar Karuppusamy,Hangjun Xu,Sai Deepika Regani,Kshitij Gupta,Gaby Nahum,Sneha Iyer,Jean-Baptiste Fiot,Yinglong Guo,Xiaowen Guo,Atul Jangra,Yucheng Liu,Jinghao Yan,Vijay Pappu,Benjamin Schulte,Deepak Chandra

from arxiv, SIGIR 2026 AgentSearch Workshop, Melbourne Australia

Traditional ads recommendation systems have primarily focused on optimizing for prediction accuracy of click or conversion events using canonical metrics such as recall or normalized discounted cumulative gain (NDCG). With the hyper-growth of ads inventory and liquidity with generative AI technologies, the prediction stability and predictability is becoming increasingly critical. Intuitively, prediction stability and predictability can be defined to quantify system robustness with respect to minor/noisy input (ads, creatives) perturbations, the lack of which could lead to advertiser perceivable problems such as repeatability, cold start and under-exploration. In this paper, we introduce a new evaluation framework for quantifying stability and predictability of an ads recommender system, and present an online validated semantic candidate generation framework powered by fine-tuned Large Language Models (LLMs) that showed significant improvement along these metrics by fundamentally improving the semantic-awareness of the system. The approach extracts hierarchical semantic attributes from ad creatives to obtain LLM representations, which serve as the foundation for graph-based expansion, ensuring the retrieved candidates encapsulate semantic variants of an ad, guaranteeing that small creative variants from the advertiser yield consistent and explainable delivery results to the user. We tested this LLM ads retrieval framework in a large-scale industrial ads recommendation system, demonstrating significant improvements across offline and online A/B experiments, showcasing gains in both predictability and traditional performance metrics. Although evaluated in the ads stack, this is a general framework that can be applied broadly to any large-scale recommendation and retrieval systems facing similar scaling and predictability challenges.

翻译：传统广告推荐系统主要聚焦于使用召回率或归一化折损累计增益（NDCG）等经典指标优化点击或转化事件的预测精度。随着生成式AI技术驱动的广告库存与流动性超常增长，预测稳定性与可预测性正变得日益关键。直观而言，预测稳定性与可预测性可定义为量化系统对微小/噪声输入（广告、创意）扰动的鲁棒性，其缺失可能导致广告主可感知的问题，如重复性、冷启动和探索不足。本文提出一种用于量化广告推荐系统稳定性与可预测性的新评估框架，并介绍一种在线验证的、由精调大语言模型（LLMs）驱动的语义候选生成框架。该框架通过从根本上增强系统的语义感知能力，在这些指标上展现出显著改进。该方法从广告创意中提取层级化语义属性以获得LLM表示，这些表示作为基于图谱扩展的基础，确保检索到的候选集包含广告的语义变体，从而保证广告主提供的微小创意变体能为用户产生一致且可解释的投放结果。我们在大规模工业级广告推荐系统中测试了该LLM广告检索框架，离线与在线A/B实验均显示显著提升，在可预测性及传统性能指标上均展现出增益。尽管在广告场景中评估，但该通用框架可广泛应用于面临类似规模扩展与可预测性挑战的任何大规模推荐与检索系统。