Most flagship language models generate explicit reasoning chains, enabling inference-time scaling. However, producing these reasoning chains increases token usage (i.e., reasoning tokens), which in turn increases latency and costs. Our OverThink attack increases overhead for applications that rely on reasoning language models (RLMs) and external context by forcing them to spend substantially more reasoning tokens while still producing contextually correct answers. An adversary mounts an attack by injecting decoy reasoning problems into public content that is consumed by RLM at inference time. Because our decoys (e.g., Markov decision processes, Sudokus, etc.) are benign, they evade safety filters. We evaluate OverThink on both closed-source and open-source reasoning models across the FreshQA, SQuAD, and MuSR datasets. We also explore the attack in multi-modal settings by creating images that cause excessive reasoning. We show that the resulting slowdown transfers across models. Finally, we explore both LLM-based and systems-level defenses, and discuss the societal, financial, and energy implications of the OverThink attacks.
翻译:大多数旗舰语言模型会生成显式的推理链,从而实现推理时的可扩展性。然而,生成这些推理链会增加令牌使用量(即推理令牌),进而增加延迟和成本。我们的OverThink攻击通过迫使依赖推理型语言模型(RLM)和外部上下文的应用消耗显著更多的推理令牌(同时仍能生成上下文正确的答案),从而增加其开销。攻击者通过将诱饵推理问题注入到推理时被RLM消费的公共内容中来发起攻击。由于我们的诱饵(例如马尔可夫决策过程、数独等)是良性的,它们能规避安全过滤器。我们在FreshQA、SQuAD和MuSR数据集上对闭源和开源推理模型评估了OverThink攻击。我们还通过创建引发过度推理的图像,在多模态场景中探索了该攻击。我们证明了由此产生的减速效应在不同模型间具有可迁移性。最后,我们探讨了基于大语言模型和系统层面的防御措施,并讨论了OverThink攻击的社会、财务和能源影响。