Diffusion Large Language Models (DLLMs) have emerged as a powerful alternative to autoregressive models, enabling parallel token generation across multiple positions. However, preference alignment of DLLMs remains challenging due to high variance introduced by Evidence Lower Bound (ELBO)-based likelihood estimation. In this work, we propose AR-MAP, a novel transfer learning framework that leverages preference-aligned autoregressive LLMs (AR-LLMs) as implicit teachers for DLLM alignment. We reveal that DLLMs can effectively absorb alignment knowledge from AR-LLMs through simple weight scaling, exploiting the shared architectural structure between these divergent generation paradigms. Crucially, our approach circumvents the high variance and computational overhead of direct DLLM alignment and comprehensive experiments across diverse preference alignment tasks demonstrate that AR-MAP achieves competitive or superior performance compared to existing DLLM-specific alignment methods, achieving 69.08\% average score across all tasks and models. Our Code is available at https://github.com/AMAP-ML/AR-MAP.
翻译:扩散大语言模型(DLLMs)已成为自回归模型的一种强大替代方案,能够实现跨多个位置的并行令牌生成。然而,由于基于证据下界(ELBO)的似然估计引入了高方差,DLLMs的偏好对齐仍然具有挑战性。在本工作中,我们提出了AR-MAP,一种新颖的迁移学习框架,该框架利用偏好对齐的自回归大语言模型(AR-LLMs)作为DLLM对齐的隐式教师。我们发现,通过简单的权重缩放,DLLMs能够有效地从AR-LLMs中吸收对齐知识,这得益于这两种不同生成范式之间共享的架构结构。至关重要的是,我们的方法规避了直接进行DLLM对齐时的高方差和计算开销。在多种偏好对齐任务上的综合实验表明,与现有的DLLM专用对齐方法相比,AR-MAP取得了具有竞争力或更优的性能,在所有任务和模型上平均得分达到69.08%。我们的代码可在 https://github.com/AMAP-ML/AR-MAP 获取。