Large language models (LLMs) enable a new form of advertising for retrieval-augmented generation (RAG) systems in which organic responses are blended with contextually relevant ads. The prospect of such "generated native ads" has sparked interest in whether they can be detected automatically. Existing datasets, however, do not reflect the diversity of advertising styles discussed in the marketing literature. In this paper, we (1) develop a taxonomy of advertising styles for LLMs, combining the style dimensions of explicitness and type of appeal, (2) simulate that advertisers may attempt to evade detection by changing their advertising style, and (3) evaluate a variety of ad-detection approaches with respect to their robustness under these changes. Expanding previous work on ad detection, we train models that use entity recognition to exactly locate an ad in an LLM response and find them to be both very effective at detecting responses with ads and largely robust to changes in the advertising style. Since ad blocking will be performed on low-resource end-user devices, we include lightweight models like random forests and SVMs in our evaluation. These models, however, are brittle under such changes, highlighting the need for further efficiency-oriented research for a practical approach to blocking of generated ads.
翻译:大型语言模型(LLMs)为检索增强生成(RAG)系统提供了一种新的广告形式,其中有机响应与上下文相关的广告内容相融合。此类“生成式原生广告”的前景引发了人们对其能否被自动检测的关注。然而,现有数据集未能反映营销文献中讨论的广告风格多样性。本文中,我们(1)构建了一个面向LLMs的广告风格分类体系,结合了明确性与诉求类型这两个风格维度;(2)模拟了广告主可能通过改变广告风格以规避检测的行为;(3)评估了多种广告检测方法在这些风格变化下的鲁棒性。在已有广告检测研究的基础上,我们训练了利用实体识别精确定位LLM响应中广告内容的模型,发现这些模型不仅能高效检测含广告的响应,而且在广告风格变化下表现出较强的鲁棒性。由于广告拦截将在资源受限的终端用户设备上执行,我们在评估中纳入了随机森林和支持向量机等轻量级模型。然而,这些模型在此类风格变化下表现脆弱,凸显了面向实际生成式广告拦截应用、需进一步开展效率导向型研究的必要性。