Large language models (LLMs) enable a new form of advertising for retrieval-augmented generation (RAG) systems in which organic responses are blended with contextually relevant ads. The prospect of such "generated native ads" has sparked interest in whether they can be detected automatically. Existing datasets, however, do not reflect the diversity of advertising styles discussed in the marketing literature. In this paper, we (1) develop a taxonomy of advertising styles for LLMs, combining the style dimensions of explicitness and type of appeal, (2) simulate that advertisers may attempt to evade detection by changing their advertising style, and (3) evaluate a variety of ad-detection approaches with respect to their robustness under these changes. Expanding previous work on ad detection, we train models that use entity recognition to exactly locate an ad in an LLM response and find them to be both very effective at detecting responses with ads and largely robust to changes in the advertising style. Since ad blocking will be performed on low-resource end-user devices, we include lightweight models like random forests and SVMs in our evaluation. These models, however, are brittle under such changes, highlighting the need for further efficiency-oriented research for a practical approach to blocking of generated ads.
翻译:大型语言模型(LLMs)为检索增强生成(RAG)系统催生了一种新型广告形式,即有机响应与上下文相关广告相融合。这种"生成式原生广告"的前景引发了学界对其能否被自动检测的关注。然而现有数据集未能反映市场营销文献中讨论的广告风格多样性。本文中,我们:(1)构建了面向LLMs的广告风格分类体系,融合了显式性程度与诉求类型两个维度;(2)模拟广告主可能通过改变广告风格来规避检测的行为;(3)评估多种广告检测方法在不同风格变化下的鲁棒性。在现有广告检测研究基础上,我们训练了使用实体识别技术精确定位LLM响应中广告位置的模型,发现这些模型既能高效检测含广告的响应,又能较好地适应广告风格变化。鉴于广告拦截将在低资源终端设备上执行,我们在评估中纳入了随机森林、支持向量机等轻量化模型。然而这些模型在广告风格变化下表现脆弱,凸显了为达成实用化生成广告拦截方案而开展效率导向型研究的必要性。