Self-correction in language models remains elusive. In this work, we explore whether language models can explicitly localize errors in incorrect reasoning, as a path toward building AI systems that can effectively correct themselves. We introduce a prompting method that structures reasoning as discrete, semantically coherent thought steps, and show that models are able to reliably localize errors within this structure, while failing to do so in conventional, unstructured chain-of-thought reasoning. Motivated by how the human brain monitors errors at discrete decision points and resamples alternatives, we introduce Iterative Correction Sampling of Thoughts (Thought-ICS), a self-correction framework. Thought-ICS iteratively prompts the model to generate reasoning one discrete and complete thought at a time--where each thought represents a deliberate decision by the model--creating natural boundaries for precise error localization. Upon verification, the model localizes the first erroneous step, and the system backtracks to generate alternative reasoning from the last correct point. When asked to correct reasoning verified as incorrect by an oracle, Thought-ICS achieves 20-40% self-correction lift. In a completely autonomous setting without external verification, it outperforms contemporary self-correction baselines.
翻译:语言模型的自校正能力仍显不足。本研究探讨语言模型能否显式定位错误推理中的谬误,以此为构建有效自校正人工智能系统开辟路径。我们提出一种提示方法,将推理过程结构化为离散且语义连贯的思维步骤,并证明模型在此结构框架内能可靠定位错误,而在传统非结构化思维链推理中则无法实现。受人类大脑在离散决策点监控错误并重采样替代方案的机制启发,我们提出迭代校正思维采样框架(Thought-ICS)。该框架通过迭代提示模型逐次生成离散完整的思维单元——每个单元代表模型的审慎决策——从而为精确错误定位创建自然边界。经验证发现错误后,模型将定位首个错误步骤,系统随即回溯至最后正确节点重新生成替代推理。在经预言机验证的谬误推理校正任务中,Thought-ICS实现了20-40%的自校正提升;在完全无需外部验证的自主场景下,其性能亦超越当前主流自校正基线方法。