Realignment becomes necessary when a language model (LM) fails to meet expected performance. We propose a flexible realignment framework that supports quantitative control of alignment degree during training and inference. This framework incorporates Training-time Realignment (TrRa), which efficiently realigns the reference model by leveraging the controllable fusion of logits from both the reference and already aligned models. For example, TrRa reduces token usage by 54.63% on DeepSeek-R1-Distill-Qwen-1.5B without any performance degradation, outperforming DeepScaleR-1.5B's 33.86%. To complement TrRa during inference, we introduce a layer adapter that enables smooth Inference-time Realignment (InRa). This adapter is initialized to perform an identity transformation at the bottom layer and is inserted preceding the original layers. During inference, input embeddings are simultaneously processed by the adapter and the original layer, followed by the remaining layers, and then controllably interpolated at the logit level. We upgraded DeepSeek-R1-Distill-Qwen-7B from a slow-thinking model to one that supports both fast and slow thinking, allowing flexible alignment control even during inference. By encouraging deeper reasoning, it even surpassed its original performance.
翻译:当语言模型(LM)未能达到预期性能时,重新对齐变得必要。我们提出了一种灵活的对齐调整框架,支持在训练和推理过程中定量控制对齐程度。该框架包含训练时对齐调整(TrRa),通过可控地融合参考模型与已对齐模型的逻辑输出,高效地对参考模型进行重新对齐。例如,TrRa在DeepSeek-R1-Distill-Qwen-1.5B上实现了54.63%的令牌使用量降低,且无任何性能损失,优于DeepScaleR-1.5B的33.86%。为在推理阶段补充TrRa,我们引入了一种层适配器,实现平滑的推理时对齐调整(InRa)。该适配器初始化为在底层执行恒等变换,并插入原始层之前。在推理过程中,输入嵌入同时由适配器和原始层处理,随后经过剩余层,最后在逻辑输出层面进行可控插值。我们将DeepSeek-R1-Distill-Qwen-7B从慢思考模型升级为同时支持快慢思考的模型,即使在推理过程中也能实现灵活的对齐控制。通过促进更深层次的推理,其性能甚至超越了原始模型。