Falcon-H1R：通过混合模型推动推理前沿，实现高效测试时扩展 (Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling)

Falcon LLM Team,Iheb Chaabane,Puneesh Khanna,Suhail Mohmad,Slim Frikha,Shi Hu,Abdalgader Abubaker,Reda Alami,Mikhail Lubinets,Mohamed El Amine Seddik,Hakim Hacid

This work introduces Falcon-H1R, a 7B-parameter reasoning-optimized model that establishes the feasibility of achieving competitive reasoning performance with small language models (SLMs). Falcon-H1R stands out for its parameter efficiency, consistently matching or outperforming SOTA reasoning models that are $2\times$ to $7\times$ larger across a variety of reasoning-intensive benchmarks. These results underscore the importance of careful data curation and targeted training strategies (via both efficient SFT and RL scaling) in delivering significant performance gains without increasing model size. Furthermore, Falcon-H1R advances the 3D limits of reasoning efficiency by combining faster inference (through its hybrid-parallel architecture design), token efficiency, and higher accuracy. This unique blend makes Falcon-H1R-7B a practical backbone for scaling advanced reasoning systems, particularly in scenarios requiring extensive chain-of-thoughts generation and parallel test-time scaling. Leveraging the recently introduced DeepConf approach, Falcon-H1R achieves state-of-the-art test-time scaling efficiency, offering substantial improvements in both accuracy and computational cost. As a result, Falcon-H1R demonstrates that compact models, through targeted model training and architectural choices, can deliver robust and scalable reasoning performance.

翻译：本研究介绍了Falcon-H1R，一个拥有70亿参数、专为推理优化的模型，它证明了小型语言模型（SLMs）同样能够实现具有竞争力的推理性能。Falcon-H1R以其参数效率脱颖而出，在多种推理密集型基准测试中，其性能持续匹配甚至超越了规模是其2倍至7倍的最先进（SOTA）推理模型。这些结果凸显了精细的数据筛选和有针对性的训练策略（通过高效的监督微调（SFT）和强化学习（RL）扩展）的重要性，它们能在不增加模型规模的情况下带来显著的性能提升。此外，Falcon-H1R通过结合更快的推理速度（得益于其混合并行架构设计）、更高的令牌效率和更高的准确性，推进了推理效率的"三维"极限。这种独特的组合使得Falcon-H1R-7B成为扩展高级推理系统的实用骨干，特别是在需要大量思维链生成和并行测试时扩展的场景中。利用最近提出的DeepConf方法，Falcon-H1R实现了最先进的测试时扩展效率，在准确性和计算成本方面均提供了显著改进。因此，Falcon-H1R证明，紧凑模型通过有针对性的模型训练和架构选择，能够提供强大且可扩展的推理性能。