Using roughly 48 execution-verified HumanEval training solutions, tuning a single initial state matrix per recurrent layer, with zero inference overhead, outperforms LoRA by +10.8 pp (p < 0.001) on HumanEval. The method, which we call S0 tuning, optimizes one state matrix per recurrent layer while freezing all model weights. On Qwen3.5-4B (GatedDeltaNet hybrid), S0 tuning improves greedy pass@1 by +23.6 +/- 1.7 pp (10 seeds). On FalconH1-7B (Mamba-2 hybrid), S0 reaches 71.8% +/- 1.3 and LoRA reaches 71.4% +/- 2.4 (3 seeds), statistically indistinguishable at this sample size while requiring no weight merging. Cross-domain transfer is significant on MATH-500 (+4.8 pp, p = 0.00002, 8 seeds) and GSM8K (+2.8 pp, p = 0.0003, 10 seeds); a text-to-SQL benchmark (Spider) shows no transfer, consistent with the trajectory-steering mechanism. A prefix-tuning control on a pure Transformer (Qwen2.5-3B) degrades performance by -13.9 pp under all nine configurations tested. On Qwen3.5, a per-step state-offset variant reaches +27.1 pp, above both S0 and LoRA but with per-step inference cost. Taken together, the results show that recurrent state initialization is a strong zero-inference-overhead PEFT surface for hybrid language models when verified supervision is scarce. The tuned state is a ~48 MB file; task switching requires no weight merging or model reload. Code and library: https://github.com/jackyoung27/s0-tuning.
翻译:使用约48个人工验证的HumanEval训练解决方案,对每个循环层优化单个初始状态矩阵且在推理过程零开销的条件下,该方法在HumanEval上比LoRA提升了+10.8个百分点(p<0.001)。我们将此方法命名为S0调优,它在冻结所有模型权重的条件下,为每个循环层优化一个状态矩阵。在Qwen3.5-4B(GatedDeltaNet混合模型)上,S0调优将贪婪pass@1提升了+23.6±1.7个百分点(10个随机种子)。在FalconH1-7B(Mamba-2混合模型)上,S0达到71.8%±1.3,LoRA达到71.4%±2.4(3个随机种子),在此样本量下两者统计上无显著差异,且无需权重合并。跨领域迁移在MATH-500(+4.8个百分点,p=0.00002,8个随机种子)和GSM8K(+2.8个百分点,p=0.0003,10个随机种子)上效果显著;而文本到SQL基准测试(Spider)未显示迁移,这与轨迹导向机制一致。在纯Transformer(Qwen2.5-3B)上采用的前缀微调控制在所有九种配置下均使性能下降-13.9个百分点。在Qwen3.5上,一种每步状态偏移变体达到了+27.1个百分点,超过S0和LoRA,但引入了每步推理开销。综合来看,结果表明:在验证监督数据稀缺的情况下,循环状态初始化是混合语言模型一种强大的零推理开销参数高效微调(PEFT)表面。调优后的状态文件约48MB;任务切换无需权重合并或模型重载。代码与库:https://github.com/jackyoung27/s0-tuning。