We present Seed Diffusion Preview, a large-scale language model based on discrete-state diffusion, offering remarkably fast inference speed. Thanks to non-sequential, parallel generation, discrete diffusion models provide a notable speedup to mitigate the inherent latency of token-by-token decoding, as demonstrated recently (e.g., Mercury Coder, Gemini Diffusion). Seed Diffusion Preview achieves an inference speed of 2,146 token/s over H20 GPUs while maintaining competitive performance across a sweep of standard code evaluation benchmarks, significantly faster than contemporary Mercury and Gemini Diffusion, establishing new state of the art on the speed-quality Pareto frontier for code models.
翻译:我们提出了种子扩散预览版,这是一种基于离散状态扩散的大规模语言模型,具备显著快速的推理速度。得益于非序列化的并行生成机制,离散扩散模型能够显著加速以缓解逐词解码所固有的延迟问题,正如近期研究(例如Mercury Coder、Gemini Diffusion)所展示的。种子扩散预览版在H20 GPU上实现了每秒2,146个词元的推理速度,同时在一系列标准代码评估基准测试中保持了具有竞争力的性能表现,其速度显著超越当前主流的Mercury和Gemini扩散模型,从而在代码模型的速度-质量帕累托前沿上确立了新的技术标杆。