Large Language Models (LLMs) have emerged as powerful re-rankers. Recent research has however showed that simple prompt injections embedded within a candidate document (i.e., jailbreak prompt attacks) can significantly alter an LLM's ranking decisions. While this poses serious security risks to LLM-based ranking pipelines, the extent to which this vulnerability persists across diverse LLM families, architectures, and settings remains largely under-explored. In this paper, we present a comprehensive empirical study of jailbreak prompt attacks against LLM rankers. We focus our evaluation on two complementary tasks: (1) Preference Vulnerability Assessment, measuring intrinsic susceptibility via attack success rate (ASR); and (2) Ranking Vulnerability Assessment, quantifying the operational impact on the ranking's quality (nDCG@10). We systematically examine three prevalent ranking paradigms (pairwise, listwise, setwise) under two injection variants: decision objective hijacking and decision criteria hijacking. Beyond reproducing prior findings, we expand the analysis to cover vulnerability scaling across model families, position sensitivity, backbone architectures, and cross-domain robustness. Our results characterize the boundary conditions of these vulnerabilities, revealing critical insights such as that encoder-decoder architectures exhibit strong inherent resilience to jailbreak attacks. We publicly release our code and additional experimental results at https://github.com/ielab/LLM-Ranker-Attack.
翻译:大型语言模型(LLMs)已成为强大的重排序工具。然而,近期研究表明,嵌入在候选文档中的简单提示注入(即越狱提示攻击)能够显著改变LLM的排序决策。尽管这对基于LLM的排序流程构成了严重的安全风险,但该脆弱性在不同LLM家族、架构和设置中的持续程度仍未得到充分探索。本文针对LLM排序器进行了越狱提示攻击的全面实证研究。我们的评估聚焦于两个互补任务:(1)偏好脆弱性评估,通过攻击成功率(ASR)衡量内在敏感性;(2)排序脆弱性评估,量化对排序质量(nDCG@10)的实际影响。我们系统考察了三种主流排序范式(成对排序、列表排序、集合排序)在两种注入变体下的表现:决策目标劫持与决策准则劫持。除复现先前发现外,我们进一步拓展分析范围,涵盖模型家族的脆弱性扩展规律、位置敏感性、骨干架构以及跨领域鲁棒性。实验结果明确了这些脆弱性的边界条件,揭示了关键发现,例如编码器-解码器架构对越狱攻击表现出固有的强韧性。我们在https://github.com/ielab/LLM-Ranker-Attack 公开了代码及补充实验结果。