分布清晰性：大语言模型中强化学习友好性的隐藏驱动因素 (Distributional Clarity: The Hidden Driver of RL-Friendliness in Large Language Models)

Language model families exhibit striking disparity in their capacity to benefit from reinforcement learning: under identical training, models like Qwen achieve substantial gains, while others like Llama yield limited improvements. Complementing data-centric approaches, we reveal that this disparity reflects a hidden structural property: \textbf{distributional clarity} in probability space. Through a three-stage analysis-from phenomenon to mechanism to interpretation-we uncover that RL-friendly models exhibit intra-class compactness and inter-class separation in their probability assignments to correct vs. incorrect responses. We quantify this clarity using the \textbf{Silhouette Coefficient} ($S$) and demonstrate that (1) high $S$ correlates strongly with RL performance; (2) low $S$ is associated with severe logic errors and reasoning instability. To confirm this property, we introduce a Silhouette-Aware Reweighting strategy that prioritizes low-$S$ samples during training. Experiments across six mathematical benchmarks show consistent improvements across all model families, with gains up to 5.9 points on AIME24. Our work establishes distributional clarity as a fundamental, trainable property underlying RL-Friendliness.

翻译：语言模型家族在从强化学习中获益的能力上表现出显著差异：在相同训练条件下，像Qwen这样的模型能获得显著提升，而Llama等其他模型则改进有限。作为数据中心方法的补充，我们揭示了这种差异反映了一种隐藏的结构特性：概率空间中的\textbf{分布清晰性}。通过从现象到机制再到解释的三阶段分析，我们发现强化学习友好型模型在对正确与错误响应的概率分配中表现出类内紧致性和类间分离性。我们使用\textbf{轮廓系数}（$S$）量化这种清晰度，并证明：（1）高$S$值与强化学习性能强相关；（2）低$S$值与严重的逻辑错误和推理不稳定性相关。为验证该特性，我们提出了轮廓感知重加权策略，在训练过程中优先处理低$S$样本。在六个数学基准测试上的实验表明，所有模型家族均获得一致改进，在AIME24上最高提升5.9分。我们的研究确立了分布清晰性作为强化学习友好性基础且可训练的根本特性。