Can a large language model (LLM) improve at code generation using only its own raw outputs, without a verifier, a teacher model, or reinforcement learning? We answer in the affirmative with simple self-distillation (SSD): sample solutions from the model with certain temperature and truncation configurations, then fine-tune on those samples with standard supervised fine-tuning. SSD improves Qwen3-30B-Instruct from 42.4% to 55.3% pass@1 on LiveCodeBench v6, with gains concentrating on harder problems, and it generalizes across Qwen and Llama models at 4B, 8B, and 30B scale, including both instruct and thinking variants. To understand why such a simple method can work, we trace these gains to a precision-exploration conflict in LLM decoding and show that SSD reshapes token distributions in a context-dependent way, suppressing distractor tails where precision matters while preserving useful diversity where exploration matters. Taken together, SSD offers a complementary post-training direction for improving LLM code generation.
翻译:一个大型语言模型(LLM)能否仅通过自身原始输出来提升代码生成能力——无需验证器、教师模型或强化学习?我们通过简单自蒸馏方法给出了肯定答案:以特定温度和截断配置从模型中采样解决方案,随后对这些样本进行标准监督微调。该方法将Qwen3-30B-Instruct在LiveCodeBench v6上的pass@1指标从42.4%提升至55.3%,且提升主要集中在较困难问题上,同时在4B、8B和30B参数规模的Qwen与Llama模型(包括指令型和思考型变体)上均具有泛化性。为理解这种简单方法为何有效,我们将性能提升追溯至LLM解码过程中的精度-探索矛盾,发现SSD以上下文依赖方式重塑了词元分布,在需要精度的场景抑制干扰性长尾分布,同时在需要探索的场景保持有效多样性。综上所述,SSD为改进LLM代码生成提供了一种互补性的后训练方向。