The increasing integration of Large Language Model (LLM) based search engines has transformed the landscape of information retrieval. However, these systems are vulnerable to adversarial attacks, especially ranking manipulation attacks, where attackers craft webpage content to manipulate the LLM's ranking and promote specific content, gaining an unfair advantage over competitors. In this paper, we study the dynamics of ranking manipulation attacks. We frame this problem as an Infinitely Repeated Prisoners' Dilemma, where multiple players strategically decide whether to cooperate or attack. We analyze the conditions under which cooperation can be sustained, identifying key factors such as attack costs, discount rates, attack success rates, and trigger strategies that influence player behavior. We identify tipping points in the system dynamics, demonstrating that cooperation is more likely to be sustained when players are forward-looking. However, from a defense perspective, we find that simply reducing attack success probabilities can, paradoxically, incentivize attacks under certain conditions. Furthermore, defensive measures to cap the upper bound of attack success rates may prove futile in some scenarios. These insights highlight the complexity of securing LLM-based systems. Our work provides a theoretical foundation and practical insights for understanding and mitigating their vulnerabilities, while emphasizing the importance of adaptive security strategies and thoughtful ecosystem design.
翻译:大语言模型(LLM)搜索引擎的日益普及已深刻改变了信息检索领域格局。然而,这类系统易受对抗攻击威胁,尤其是排名操纵攻击——攻击者通过构造网页内容干扰LLM的排序机制,以不正当竞争优势推广特定内容。本文聚焦排名操纵攻击的动态演变机理。我们将该问题建模为无限重复囚徒困境博弈,其中多个博弈方策略性地选择合作或攻击。通过分析合作行为得以维持的条件,识别出影响博弈方行为的关键要素,包括攻击成本、贴现率、攻击成功率及触发策略等。我们发现了系统动态中的临界点,证明当博弈方具备前瞻性时,合作更易维系。但从防御视角看,研究发现单纯降低攻击成功概率在特定条件下反而可能激励攻击行为。此外,对攻击成功率设置上限的防御措施在某些场景下可能失效。这些发现揭示了保障LLM系统安全的复杂性。本研究为理解并缓解此类系统脆弱性提供了理论基础与实践洞见,同时强调了自适应安全策略与系统性生态设计的重要性。