The Vulnerability of LLM Rankers to Prompt Injection Attacks

Large Language Models (LLMs) have emerged as powerful re-rankers. Recent research has however showed that simple prompt injections embedded within a candidate document (i.e., jailbreak prompt attacks) can significantly alter an LLM's ranking decisions. While this poses serious security risks to LLM-based ranking pipelines, the extent to which this vulnerability persists across diverse LLM families, architectures, and settings remains largely under-explored. In this paper, we present a comprehensive empirical study of jailbreak prompt attacks against LLM rankers. We focus our evaluation on two complementary tasks: (1) Preference Vulnerability Assessment, measuring intrinsic susceptibility via attack success rate (ASR); and (2) Ranking Vulnerability Assessment, quantifying the operational impact on the ranking's quality (nDCG@10). We systematically examine three prevalent ranking paradigms (pairwise, listwise, setwise) under two injection variants: decision objective hijacking and decision criteria hijacking. Beyond reproducing prior findings, we expand the analysis to cover vulnerability scaling across model families, position sensitivity, backbone architectures, and cross-domain robustness. Our results characterize the boundary conditions of these vulnerabilities, revealing critical insights such as that encoder-decoder architectures exhibit strong inherent resilience to jailbreak attacks. We publicly release our code and additional experimental results at https://github.com/ielab/LLM-Ranker-Attack.

翻译：大型语言模型（LLMs）已成为强大的重排序工具。然而，近期研究表明，嵌入在候选文档中的简单提示注入（即越狱提示攻击）能够显著改变LLM的排序决策。尽管这对基于LLM的排序流程构成了严重的安全风险，但该脆弱性在不同LLM家族、架构和设置中的持续程度仍未得到充分探索。本文针对LLM排序器进行了越狱提示攻击的全面实证研究。我们的评估聚焦于两个互补任务：（1）偏好脆弱性评估，通过攻击成功率（ASR）衡量内在敏感性；（2）排序脆弱性评估，量化对排序质量（nDCG@10）的实际影响。我们系统考察了三种主流排序范式（成对排序、列表排序、集合排序）在两种注入变体下的表现：决策目标劫持与决策准则劫持。除复现先前发现外，我们进一步拓展分析范围，涵盖模型家族的脆弱性扩展规律、位置敏感性、骨干架构以及跨领域鲁棒性。实验结果明确了这些脆弱性的边界条件，揭示了关键发现，例如编码器-解码器架构对越狱攻击表现出固有的强韧性。我们在https://github.com/ielab/LLM-Ranker-Attack 公开了代码及补充实验结果。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

大型语言模型系统中提示缺陷的分类学

专知会员服务

8+阅读 · 2025年9月19日

《联邦军事大语言模型中潜在提示注入攻击的探索与缓解对策》

专知会员服务

15+阅读 · 2025年5月22日

探索联邦军事大型语言模型中的潜在提示注入攻击及其缓解方法

专知会员服务

37+阅读 · 2025年2月4日

大语言模型在序列推荐中的应用

专知会员服务

19+阅读 · 2024年11月12日