Diffusion language models (DLMs) have emerged as a promising alternative to the long-dominant autoregressive (AR) paradigm, offering a parallelable decoding process that could yield greater efficiency. Yet, in practice, current open-source DLMs often underperform their AR counterparts in speed, limiting their real-world utility. This work presents a systematic study of DLM efficiency, identifying key issues in prior evaluation methods. Through empirical benchmarking and a roofline-based theoretical analysis, we demonstrate that AR models generally achieve higher throughput, while DLMs consistently lag. We also investigate acceleration strategies, finding that techniques like dual cache and parallel decoding mainly offer gains at small batch sizes, with their benefits diminishing upon scaling. Our findings underscore the necessity of robust evaluation methods and improved acceleration strategies to advance research on DLMs.
翻译:扩散语言模型(DLMs)作为一种有前景的替代方案,挑战了长期占据主导地位的自回归(AR)范式,其可并行化的解码过程可能带来更高的效率。然而,在实际应用中,当前开源的DLMs在速度上往往落后于对应的AR模型,这限制了其实际应用价值。本研究对DLM的效率进行了系统性分析,指出了先前评估方法中的关键问题。通过实证基准测试和基于屋顶线模型的理论分析,我们证明AR模型通常能实现更高的吞吐量,而DLMs则持续表现滞后。我们还探究了加速策略,发现如双缓存和并行解码等技术主要在小批量规模下提供增益,其优势在规模扩展时会逐渐减弱。我们的研究结果强调了建立稳健的评估方法和改进加速策略的必要性,以推动DLMs研究的进展。