Best-of-N sampling is a powerful method for improving Large Language Model (LLM) performance, but it is often limited by its dependence on massive, text-based reward models. These models are not only computationally expensive but also data-hungry, requiring extensive labeled datasets for training. This creates a significant data challenge, as they overlook a rich, readily available data source: the LLM's own internal hidden states. To address this data and efficiency gap, we introduce SWIFT (Simple Weighted Intrinsic Feedback Technique), a novel and lightweight method that learns a reward function directly from the rich information embedded in LLM hidden states. Operating at the token embedding level, SWIFT employs simple linear layers to effectively distinguish between preferred and dispreferred generations, eliminating the need for computationally intensive text-based modeling. Extensive experiments on standard benchmarks show that SWIFT outperforms existing baselines (12.7% higher accuracy than EurusRM-7B on MATH dataset) while using less than 0.005% of their parameters. Its robust scalability, compatibility with certain closed-source models via logit access, and ability to combine with traditional reward models for additional performance highlight SWIFT's practical value and contribution to more efficient data-driven LLM post-training. Our code is available at https://github.com/aster2024/SWIFT .
翻译:N选一采样是提升大语言模型性能的有效方法,但其性能常受限于对大规模文本奖励模型的依赖。这类模型不仅计算成本高昂,且需要大量标注数据进行训练,存在显著的数据挑战——它们忽略了一个丰富且易于获取的数据源:LLM自身的内部隐藏状态。为弥补这一数据与效率缺口,我们提出SWIFT(简易加权内在反馈技术),这是一种新颖的轻量级方法,可直接从LLM隐藏状态中嵌入的丰富信息学习奖励函数。SWIFT在词元嵌入层面运行,通过简单的线性层有效区分优选与非优选生成结果,无需依赖计算密集的文本建模。在标准基准测试上的大量实验表明,SWIFT在仅使用基线模型0.005%参数量的情况下,性能显著优于现有基线(在MATH数据集上准确率比EurusRM-7B高出12.7%)。其强大的可扩展性、通过logit访问与某些闭源模型的兼容性,以及与传统奖励模型结合进一步提升性能的能力,彰显了SWIFT的实用价值及其对构建更高效数据驱动LLM后训练方法的贡献。代码已开源:https://github.com/aster2024/SWIFT。