Large Language Models are increasingly optimized for deep reasoning, prioritizing the correct execution of complex tasks over general conversation. We investigate whether this focus on calculation creates a "tunnel vision" that ignores safety in critical situations. We introduce MortalMATH, a benchmark of 150 scenarios where users request algebra help while describing increasingly life-threatening emergencies (e.g., stroke symptoms, freefall). We find a sharp behavioral split: generalist models (like Llama-3.1) successfully refuse the math to address the danger. In contrast, specialized reasoning models (like Qwen-3-32b and GPT-5-nano) often ignore the emergency entirely, maintaining over 95 percent task completion rates while the user describes dying. Furthermore, the computational time required for reasoning introduces dangerous delays: up to 15 seconds before any potential help is offered. These results suggest that training models to relentlessly pursue correct answers may inadvertently unlearn the survival instincts required for safe deployment.
翻译:大型语言模型正日益针对深度推理进行优化,将复杂任务的正确执行置于一般对话之上。我们研究了这种对计算的关注是否会造成一种“隧道视野”,从而在危急情况下忽视安全性。我们引入了MortalMATH,这是一个包含150个场景的基准测试,其中用户在描述日益危及生命的紧急情况(例如中风症状、自由落体)的同时请求代数帮助。我们发现了一个显著的行为分化:通用模型(如Llama-3.1)能够成功拒绝数学问题以应对危险。相比之下,专用推理模型(如Qwen-3-32b和GPT-5-nano)常常完全忽略紧急情况,在用户描述濒死状态时仍保持超过95%的任务完成率。此外,推理所需的计算时间会引入危险的延迟:在提供任何潜在帮助之前可能长达15秒。这些结果表明,训练模型不懈追求正确答案可能会无意中使其丧失安全部署所需的生存本能。