Table reasoning with large language models (LLMs) plays a critical role in building intelligent systems capable of understanding and analyzing tabular data. Despite recent progress, existing methods still face key limitations: their reasoning processes lacks depth and explicit multi-step reasoning, often relying solely on implicit language model understanding. In addition, their reasoning processes suffer from instability, primarily caused by model uncertainty. In this work, we propose STaR, a novel slow-thinking model that can achieve effective and stable table reasoning. To enable effective multi-step reasoning, we design a two-stage training framework consisting of supervised fine-tuning (SFT) warm-up followed by reinforced fine-tuning (RFT). Specifically, in the SFT stage, we construct a high-quality dataset through automatic self-verification. In the RFT stage, we introduce a difficulty-aware reinforcement learning mechanism to further enhance reasoning capabilities. Furthermore, to improve reasoning stability, we introduce trajectory-level uncertainty quantification, which fuses token-level confidence with answer-level consistency, enabling the selection of better reasoning trajectories. Extensive experiments demonstrate that STaR-8B achieves state-of-the-art performance on in-domain benchmarks and exhibits strong generalization to out-of-domain datasets, highlighting its potential for enhancing both effectiveness and stability in table reasoning.
翻译:基于大语言模型(LLMs)的表格推理在构建能够理解和分析表格数据的智能系统中起着关键作用。尽管近期取得进展,现有方法仍面临关键局限:其推理过程缺乏深度和显式的多步推理,往往仅依赖语言模型的隐式理解。此外,其推理过程存在不稳定性,主要源于模型的不确定性。本研究提出STaR——一种新型的慢思考模型,能够实现高效且稳定的表格推理。为实现有效的多步推理,我们设计了包含监督微调(SFT)预热和强化微调(RFT)的两阶段训练框架。具体而言,在SFT阶段,我们通过自动自验证构建高质量数据集;在RFT阶段,我们引入难度感知的强化学习机制以进一步增强推理能力。此外,为提升推理稳定性,我们提出轨迹级不确定性量化方法,该方法融合词元级置信度与答案级一致性,从而能够选择更优的推理轨迹。大量实验表明,STaR-8B在领域内基准测试中达到最先进性能,并在领域外数据集上展现出强大的泛化能力,凸显了其在提升表格推理效能与稳定性方面的潜力。