We present DEMON, a real-time diffusion engine that makes the denoising process playable as a live musical instrument: a control surface both broad (many parameters shaped per-frame across the output) and responsive (each control taking effect as fast as its place in the denoising loop allows). Built on ACE-Step 1.5 and StreamDiffusion's ring-buffer architecture with TensorRT acceleration, it sustains up to 12.3 decoder completions per second for 60-second music on a single consumer GPU (RTX 5090), or 11.3 generations per second at our production ring-depth of 4. At these rates denoising parameters become viable as live performance controls, but the ring buffer propagates per-request changes only at its drain rate, a floor of S denoising steps. We contribute four mechanisms. (1) Per-slot heterogeneous denoise scheduling: each ring-buffer slot owns its timestep schedule, so a moving denoise slider is tracked without wiping the in-flight queue, where the upstream global-schedule design must rebuild and discard it. (2) Shared mutable per-step state, giving any parameter consulted at every solver step next-tick effect, bypassing ring-buffer drain. (3) Per-frame source blending: a sampling-time control on the standard SDE re-noise step, giving a framewise transformation-strength axis that complements scalar denoise scheduling. (4) Windowed VAE decode exploiting receptive-field analysis for an 8.0x decode speedup. Together these separate streaming-diffusion parameters into four propagation classes, by onset and convergence latency.
翻译:我们提出DEMON,一种实时扩散引擎,将去噪过程转化为可演奏的活态乐器:其控制界面兼具广度(每帧跨输出调整多个参数)与灵敏度(每个控制按其在去噪循环中的位置以最快速度生效)。该引擎基于ACE-Step 1.5与StreamDiffusion的环形缓冲区架构,并采用TensorRT加速,可在单张消费级GPU(RTX 5090)上对60秒音乐维持每秒12.3次解码器完整生成,或在生产级环深为4时达到每秒11.3次生成。在此速率下,去噪参数可作为现场演奏控制手段,但环形缓冲区仅以去噪步骤下限S的排出速率传播每次请求的变更。我们贡献四项机制:(1)时隙异构去噪调度:每个环形缓冲区时隙拥有独立的时间步调度,使得移动去噪滑块无需清空处理中队列即可被追踪,而传统的全局调度设计需重建并丢弃队列。(2)共享可变的每步状态:使任何在求解器每步中被调用的参数获得下一次刻的生效效果,从而绕过环形缓冲区的排出限制。(3)每帧源混合:对标准SDE重噪步骤引入采样时控制,提供帧级变换强度轴,作为标量去噪调度的补充。(4)窗口化VAE解码:利用感受野分析实现8.0倍解码加速。综合上述机制,我们将流式扩散参数按起效延迟与收敛延迟划分为四类传播模式。