大型语言模型生成响应中的广告插入 (Ad Insertion in LLM-Generated Responses)

Sustainable monetization of Large Language Models (LLMs) remains a critical open challenge. Traditional search advertising, which relies on static keywords, fails to capture the fleeting, context-dependent user intents--the specific information, goods, or services a user seeks--embedded in conversational flows. Beyond the standard goal of social welfare maximization, effective LLM advertising imposes additional requirements on contextual coherence (ensuring ads align semantically with transient user intents) and computational efficiency (avoiding user interaction latency), as well as adherence to ethical and regulatory standards, including preserving privacy and ensuring explicit ad disclosure. Although various recent solutions have explored bidding on token-level and query-level, both categories of approaches generally fail to holistically satisfy this multifaceted set of constraints. We propose a practical framework that resolves these tensions through two decoupling strategies. First, we decouple ad insertion from response generation to ensure safety and explicit disclosure. Second, we decouple bidding from specific user queries by using ``genres'' (high-level semantic clusters) as a proxy. This allows advertisers to bid on stable categories rather than sensitive real-time response, reducing computational burden and privacy risks. We demonstrate that applying the VCG auction mechanism to this genre-based framework yields approximately dominant strategy incentive compatibility (DSIC) and individual rationality (IR), as well as approximately optimal social welfare, while maintaining high computational efficiency. Finally, we introduce an "LLM-as-a-Judge" metric to estimate contextual coherence. Our experiments show that this metric correlates strongly with human ratings (Spearman's $ρ\approx 0.66$), outperforming 80% of individual human evaluators.

翻译：大型语言模型（LLMs）的可持续货币化仍是一个关键且悬而未决的挑战。传统的搜索广告依赖于静态关键词，无法捕捉对话流中嵌入的、转瞬即逝且依赖于上下文用户意图——即用户寻求的特定信息、商品或服务。除了社会福利最大化的标准目标外，有效的LLM广告还需满足额外的要求：上下文连贯性（确保广告与瞬态用户意图在语义上保持一致）、计算效率（避免用户交互延迟），以及遵守伦理和监管标准，包括保护隐私和确保明确的广告披露。尽管近期的多种解决方案探索了在令牌级别和查询级别进行竞价，但这两类方法通常都未能全面满足这一系列多方面的约束。我们提出了一个实用框架，通过两种解耦策略来解决这些矛盾。首先，我们将广告插入与响应生成解耦，以确保安全性和明确的披露。其次，我们通过使用"体裁"（高层语义聚类）作为代理，将竞价与特定用户查询解耦。这使得广告商可以对稳定的类别而非敏感的实时响应进行竞价，从而减少了计算负担和隐私风险。我们证明，将VCG拍卖机制应用于这个基于体裁的框架，能够产生近似占优策略激励相容性（DSIC）和个体理性（IR），以及近似最优的社会福利，同时保持较高的计算效率。最后，我们引入了一种"LLM-as-a-Judge"指标来估计上下文连贯性。我们的实验表明，该指标与人类评分强相关（Spearman's $ρ\approx 0.66$），其表现优于80%的个体人类评估者。