Large language models (LLMs) now generate substantial production code, often for tasks with multiple valid algorithmic solutions. Incidental prompt cues, meaning contextual words or metadata outside the task specification, can steer which algorithm the model selects, even when all outputs pass the same tests. Prompt sensitivity is well studied as a tool to improve output quality. Here, output policy means algorithm choice under fixed correctness. We define algorithm steering as cue-induced shifts in algorithm-family distributions and run 46,535 controlled experiments across 11 tasks, 19 cue types (18 channels plus a memoization semantic-vs-surface ablation that preserves meaning while changing typography and punctuation), and 15 model configurations. We find large, systematic shifts in algorithm-family distributions (up to 100 pp), largely consistent with cue semantics, including in applied tasks such as rate limiting. Direct algorithm naming is the most reliable mitigation we tested. Accidental context therefore creates an "invisible lottery" over performance, security, and maintainability.
翻译:大语言模型(LLM)如今生成了大量生产级代码,其中许多任务存在多种有效的算法解决方案。偶然的提示线索(即任务规范之外的语境词或元数据)能够影响模型选择哪种算法,即便所有输出都能通过相同的测试。提示敏感性作为提升输出质量的工具已被广泛研究。在此,输出策略指在固定正确性前提下的算法选择。我们将算法引导定义为线索引发的算法族分布偏移,并在11项任务、19种线索类型(18种通道加上一种记忆化语义vs表面消融实验,后者在保留含义的同时改变排版和标点)以及15种模型配置下进行了46,535次受控实验。我们发现算法族分布存在大规模系统性偏移(高达100个百分点),且与线索语义高度一致,包括在速率限制等应用任务中。直接命名算法是我们测试过的最可靠的缓解措施。因此,偶然语境在性能、安全性和可维护性方面制造了一场“看不见的抽签”。