Test-time scaling has significantly improved how AI models solve problems, yet current methods often get stuck in repetitive, incorrect patterns of thought. We introduce HEART, a framework that uses emotional cues to guide the model's focus, much like how feelings contribute to human decision-making. By alternating between critical tones to sharpen error detection and encouraging tones to spark new ideas, HEART helps the model break out of dead-end reasoning and find the right solution. We evaluate HEART across seven high-difficulty benchmarks--including Humanity's Last Exam, GPQA Diamond, and LiveCodeBench--demonstrating robustness across diverse models. Results show that emotion facilitates deeper reasoning, yielding consistent accuracy gains over affect-sterile baselines. These findings suggest that the next frontier in machine reasoning lies in the strategic integration of affective regulation to guide logical synthesis.
翻译:测试时缩放技术显著提升了AI模型解决问题的性能,然而现有方法常陷入重复且错误的思维模式。我们提出了HEART框架,该框架利用情感线索引导模型注意力,其机制类似于情感在人类决策中的作用。通过交替使用批判性语气以增强错误检测能力,以及鼓励性语气以激发新思路,HEART帮助模型摆脱僵化推理路径并找到正确解决方案。我们在七个高难度基准测试(包括Humanity's Last Exam、GPQA Diamond和LiveCodeBench)上评估HEART,证明了该框架在不同模型间的鲁棒性。实验结果表明,情感机制能促进更深层次的推理,相比无情感干预的基线方法实现了持续稳定的准确率提升。这些发现表明,通过策略性整合情感调节来引导逻辑合成,将成为机器推理领域的下一个前沿方向。