BenchOverflow：基于纯文本提示词的大型语言模型溢出效应测量 (BenchOverflow: Measuring Overflow in Large Language Models via Plain-Text Prompts)

We investigate a failure mode of large language models (LLMs) in which plain-text prompts elicit excessive outputs, a phenomenon we term Overflow. Unlike jailbreaks or prompt injection, Overflow arises under ordinary interaction settings and can lead to elevated serving cost, latency, and cross-user performance degradation, particularly when scaled across many requests. Beyond usability, the stakes are economic and environmental: unnecessary tokens increase per-request cost and energy consumption, compounding into substantial operational spend and carbon footprint at scale. Moreover, Overflow represents a practical vector for compute amplification and service degradation in shared environments. We introduce BenchOverflow, a model-agnostic benchmark of nine plain-text prompting strategies that amplify output volume without adversarial suffixes or policy circumvention. Using a standardized protocol with a fixed budget of 5000 new tokens, we evaluate nine open- and closed-source models and observe pronounced rightward shifts and heavy tails in length distributions. Cap-saturation rates (CSR@1k/3k/5k) and empirical cumulative distribution functions (ECDFs) quantify tail risk; within-prompt variance and cross-model correlations show that Overflow is broadly reproducible yet heterogeneous across families and attack vectors. A lightweight mitigation-a fixed conciseness reminder-attenuates right tails and lowers CSR for all strategies across the majority of models. Our findings position length control as a measurable reliability, cost, and sustainability concern rather than a stylistic quirk. By enabling standardized comparison of length-control robustness across models, BenchOverflow provides a practical basis for selecting deployments that minimize resource waste and operating expense, and for evaluating defenses that curb compute amplification without eroding task performance.

翻译：本文研究大型语言模型（LLMs）的一种失效模式：纯文本提示词引发过量输出，我们将此现象称为溢出。与越狱攻击或提示注入不同，溢出在普通交互场景下即可出现，可能导致服务成本上升、延迟增加及跨用户性能下降，尤其在请求规模扩大时更为显著。除可用性问题外，溢出还涉及经济与环境风险：冗余的token会提升单次请求成本与能耗，在规模化部署中将累积为可观的运营开支与碳足迹。此外，溢出在共享环境中构成计算资源放大与服务降级的潜在攻击向量。我们提出BenchOverflow——一个包含九种纯文本提示策略的模型无关基准，这些策略无需对抗性后缀或策略规避即可放大输出量。通过采用固定5000新token预算的标准化协议，我们对九个开源与闭源模型进行评估，观察到输出长度分布呈现明显的右移与重尾特征。容量饱和率（CSR@1k/3k/5k）与经验累积分布函数（ECDF）量化了尾部风险；提示内方差与跨模型相关性表明溢出效应具有广泛可复现性，但在不同模型家族与攻击向量间存在异质性。一种轻量级缓解措施——固定的简洁性提示——能够削弱右尾分布并降低大多数模型中所有策略的CSR。我们的研究将长度控制定位为可量化的可靠性、成本与可持续性问题，而非风格偏好。BenchOverflow通过建立跨模型长度控制鲁棒性的标准化比较框架，为选择最小化资源浪费与运营成本的部署方案、以及评估在不削弱任务性能前提下抑制计算放大的防御机制提供了实践基础。