Large language models produce repetitive output when prompted independently across many batches, a phenomenon we term cross-batch mode collapse: the progressive loss of output diversity when a language model is prompted repeatedly without access to its prior generations. Practitioners have long mitigated this with ad hoc deduplication and seed rotation, but no principled framework exists. We introduce Dynamic Context Evolution (DCE), comprising three mechanisms: (1) verbalized tail sampling (the model labels each idea with a guess about how obvious it is, and obvious ideas are discarded), which filters high-probability candidates via model self-assessment; (2) semantic memory, which maintains a persistent embedding index to reject near-duplicates across batches; and (3) adaptive prompt evolution, which reconstructs the generation prompt each batch using memory state and rotating diversity strategies. In experiments across three domains (sustainable packaging concepts, educational exam questions, and creative writing prompts) and two model families (gpt-5-mini and claude-haiku-4-5), a component ablation across 2-3 random seeds per method shows that DCE achieves 0.0 +/- 0.0% collapse versus 5.6 +/- 2.0% for naive prompting, while producing 17-18 HDBSCAN clusters per seed versus naive's volatile 2-17, indicating reliably richer conceptual structure. These results are validated with an independent embedding model (all-MiniLM-L6-v2) and hold across sensitivity sweeps of the VTS threshold tau and dedup threshold delta. Deduplication and prompt evolution are individually insufficient but jointly effective, at approximately $0.50 per 1,000 candidates using only standard API calls, with no fine-tuning or custom architectures required.
翻译:大型语言模型在跨多个批次独立提示时会生成重复输出,我们将此现象称为跨批量模式坍塌:当语言模型在无法访问其先前生成内容的情况下被重复提示时,输出多样性会逐步丧失。从业者长期以来通过临时去重和种子轮换缓解此问题,但缺乏理论框架支持。我们提出动态上下文演化(DCE),包含三种机制:(1)显式化尾部采样——模型为每个想法标注其明显程度的自评分数,并丢弃高概率明显想法,通过模型自我评估过滤高概率候选;(2)语义记忆——维护持久化嵌入索引以跨批次拒绝近似重复项;(3)自适应提示演化——利用记忆状态和轮换多样性策略,在每批次中重构生成提示。在三个领域(可持续包装概念、教育试题、创意写作提示)和两个模型家族(gpt-5-mini和claude-haiku-4-5)的实验中,通过每个方法2-3个随机种子的组件消融分析表明:DCE实现0.0±0.0%的坍塌率,而朴素提示为5.6±2.0%;同时每个种子生成17-18个HDBSCAN聚类,而朴素提示的聚类数在2-17间波动,表明DCE能产生更稳定、更丰富的概念结构。这些结果经独立嵌入模型(all-MiniLM-L6-v2)验证,并在VTS阈值τ和去重阈值δ的敏感性扫描中保持稳定。去重与提示演化单独使用效果不足,但联合使用效果显著:仅使用标准API调用,每1000个候选生成成本约0.50美元,无需微调或定制架构。