Large Reasoning Models (LRMs) have demonstrated promising performance in complex tasks. However, the resource-consuming reasoning processes may be exploited by attackers to maliciously occupy the resources of the servers, leading to a crash, like the DDoS attack in cyber. To this end, we propose a novel attack method on LRMs termed ExtendAttack to maliciously occupy the resources of servers by stealthily extending the reasoning processes of LRMs. Concretely, we systematically obfuscate characters within a benign prompt, transforming them into a complex, poly-base ASCII representation. This compels the model to perform a series of computationally intensive decoding sub-tasks that are deeply embedded within the semantic structure of the query itself. Extensive experiments demonstrate the effectiveness of our proposed ExtendAttack. Remarkably, it increases the length of the model's response by over 2.5 times for the o3 model on the HumanEval benchmark. Besides, it preserves the original meaning of the query and achieves comparable answer accuracy, showing the stealthiness.
翻译:大型推理模型(LRMs)在复杂任务中展现出优异性能。然而,其资源密集型的推理过程可能被攻击者利用以恶意占用服务器资源,导致系统崩溃,类似于网络中的DDoS攻击。为此,我们提出一种针对LRMs的新型攻击方法ExtendAttack,通过隐蔽地扩展模型推理过程来恶意占用服务器资源。具体而言,我们系统性地混淆良性提示中的字符,将其转换为复杂的多进制ASCII表示,迫使模型执行一系列深度嵌入查询语义结构中的计算密集型解码子任务。大量实验证明了我们提出的ExtendAttack的有效性。值得注意的是,在HumanEval基准测试中,该方法使o3模型的响应长度增加了2.5倍以上。此外,攻击在保持查询原意的同时实现了相当的答案准确率,体现了其隐蔽性。