Recent generative engine optimisation (GEO) research has shown that prompt-injection attacks can push a target product to the top of an LLM's recommendation list, with the strongest attacks reporting around $80\%$ success and raising serious security concerns about RAG-based recommendation. However, these results assume the attacked document is always fed directly to the generator, bypassing the retriever and reranker. This is unrealistic: in deployed RAG systems, the attack modifies the document content, which can in turn change whether the document is retrieved and reranked highly enough to reach the generator at all. In this paper, we re-evaluate seven GEO attacks under a realistic three-stage pipeline (retriever\,$\to$\,LLM reranker\,$\to$\,LLM generator). We find that prior protocols substantially overstate attack effectiveness: gradient-based and instruction override attacks largely collapse before reaching the generator, and only LLM-driven prompt injections remain effective end-to-end. Our analysis further reveals that current GEO attacks are easily detectable: a lightweight prompt-injection guard finetuned on a small attack dataset already detects every attack. Our code and data are available at https://github.com/ielab/geo_injection_rag_survival.
翻译:近期生成式引擎优化(GEO)研究表明,提示注入攻击能够将目标产品推至大语言模型推荐列表的顶端,最强攻击报告的成功率约为80%,这引发了人们对基于RAG的推荐系统的严重安全担忧。然而,这些结果假设被攻击文档始终直接输入生成器,绕过了检索器和重排序器。这并不符合实际:在已部署的RAG系统中,攻击会修改文档内容,而这可能反过来改变文档是否被检索和重排序至足够高的程度,从而能否最终到达生成器。本文在真实的三阶段流水线(检索器→大语言模型重排序器→大语言模型生成器)下重新评估了七种GEO攻击。我们发现先前的评估协议大幅高估了攻击有效性:基于梯度和指令覆盖的攻击在到达生成器前基本失效,仅大语言模型驱动的提示注入仍能保持端到端有效性。我们的分析进一步表明,当前的GEO攻击极易被检测:一个基于小规模攻击数据集微调的轻量级提示注入检测器已能识别所有攻击。我们的代码和数据可在https://github.com/ielab/geo_injection_rag_survival获取。