Large Reasoning Models (LRMs) with long chain-of-thought reasoning have recently achieved remarkable success. Yet, equipping domain-specialized models with such reasoning capabilities, referred to as "Reasoning + X", remains a significant challenge. While model merging offers a promising training-free solution, existing methods often suffer from a destructive performance collapse: existing methods tend to both weaken reasoning depth and compromise domain-specific utility. Interestingly, we identify a counter-intuitive phenomenon underlying this failure: reasoning ability predominantly resides in parameter regions with low gradient sensitivity, contrary to the common assumption that domain capabilities correspond to high-magnitude parameters. Motivated by this insight, we propose ReasonAny, a novel merging framework that resolves the reasoning-domain performance collapse through Contrastive Gradient Identification. Experiments across safety, biomedicine, and finance domains show that ReasonAny effectively synthesizes "Reasoning + X" capabilities, significantly outperforming state-of-the-art baselines while retaining robust reasoning performance.
翻译:具备长链思维推理能力的大型推理模型(LRMs)近期取得了显著成功。然而,如何为领域专用模型赋予此类推理能力(即“推理+X”)仍是一个重大挑战。尽管模型融合提供了一种无需训练的有前景的解决方案,但现有方法常遭受破坏性的性能崩溃:它们往往会同时削弱推理深度并损害领域特定效用。有趣的是,我们发现了导致这一失败的反直觉现象:推理能力主要驻留在梯度敏感性较低的参数区域,这与通常认为领域能力对应高幅度参数的假设相反。基于这一洞见,我们提出了ReasonAny,一种新颖的融合框架,它通过对比梯度识别解决了推理-领域性能崩溃问题。在安全、生物医学和金融领域的实验表明,ReasonAny能有效合成“推理+X”能力,在保持稳健推理性能的同时,显著优于现有最先进的基线方法。