Recent advances in Large Language Models (LLMs) have demonstrated remarkable progress in their reasoning capabilities, such as Chain-of-Thought (CoT). Most approaches rely on CoT rationales. Previous studies have shown that LLMs often generate logically inconsistent reasoning steps even when their final answers are correct. These inconsistencies reduce the reliability of the reasoning process. We propose GeoSteer, a manifold-based framework that improves the quality of intermediate reasoning. The method consists of: (1) constructing a CoT dataset with step-level scores, (2) training a Variational Autoencoder (VAE) model and a quality estimation model to learn a low-dimensional manifold of high-quality CoT trajectories, and (3) steering hidden states of target LLMs toward higher-quality regions in the latent space. This last step enables steering of the hidden states by following gradients along the learned manifold. It facilitates geometrically coherent steering. Evaluation experiments were conducted on the GSM8k dataset using the Qwen3 series. We evaluated performance using two metrics: answer accuracy and overall reasoning quality. GeoSteer improved the accuracy by 0.9 points and enhanced the reasoning quality by 4.5 points on average, compared with those of original LLMs. These results indicate that GeoSteer improves an effective and controllable mechanism for improving the quality of intermediate reasoning in LLMs.
翻译:大型语言模型(LLM)的最新进展在推理能力方面展现出显著进步,例如思维链(CoT)。大多数方法依赖于CoT推理过程。先前研究表明,即使最终答案正确,LLM也常生成逻辑不一致的推理步骤,这些不一致性降低了推理过程的可靠性。我们提出GeoSteer——一种基于流形的框架,用于提升中间推理的质量。该方法包含:(1)构建具有步骤级评分的CoT数据集;(2)训练变分自编码器(VAE)模型与质量评估模型,以学习高质量CoT轨迹的低维流形;(3)将目标LLM的隐藏状态向潜在空间中更高质量区域引导。最后一步通过沿学习流形的梯度进行隐藏状态引导,实现几何连贯的调控。我们在GSM8k数据集上使用Qwen3系列模型进行评估实验,采用答案准确率和整体推理质量两项指标进行评估。与原始LLM相比,GeoSteer平均将准确率提升0.9个百分点,推理质量提高4.5个百分点。这些结果表明,GeoSteer为提升LLM中间推理质量提供了有效且可控的机制。