The transition from fitting empirical data to achieving true human utility is fundamentally constrained by a granularity mismatch, where fine-grained autoregressive generation is often supervised by coarse or uniform signals. This position paper advocates Token Priority as the essential bridge, formalizing Supervised Fine-Tuning (SFT) not as simple optimization but as a precise distribution reshaping process that aligns raw data with the ideal alignment manifold. We analyze recent breakthroughs through this unified lens, categorizing them into two distinct regimes: Positive Priority for noise filtration and Signed Priority for toxic modes unlearning. We revisit existing progress and limitations, identify key challenges, and suggest directions for future research.
翻译:从拟合经验数据到实现真正人类效用的转变,根本上受粒度失配的制约,即细粒度的自回归生成常受粗粒度或均匀信号的监督。本立场论文主张Token Priority是关键的桥梁,将监督微调形式化为精确的分布重塑过程——而非简单的优化——从而使原始数据与理想的对齐流形保持一致。我们通过这一统一视角分析近期突破,将其归类为两种不同机制:用于噪声过滤的Positive Priority与用于消除有害模式的Signed Priority。我们重新审视现有进展与局限,指出关键挑战,并为未来研究方向提出建议。