Diffusion Policy has dominated action generation due to its strong capabilities for modeling multi-modal action distributions, but its multi-step denoising processes make it impractical for real-time visuomotor control. Existing caching-based acceleration methods typically rely on $\textit{static}$ schedules that fail to adapt to the $\textit{dynamics}$ of robot-environment interactions, thereby leading to suboptimal performance. In this paper, we propose $\underline{\textbf{S}}$parse $\underline{\textbf{A}}$ction$\underline{\textbf{G}}$en ($\textbf{SAG}$) for extremely sparse action generation. To accommodate the iterative interactions, SAG customizes a rollout-adaptive prune-then-reuse mechanism that first identifies prunable computations globally and then reuses cached activations to substitute them during action diffusion. To capture the rollout dynamics, SAG parameterizes an observation-conditioned diffusion pruner for environment-aware adaptation and instantiates it with a highly parameter- and inference-efficient design for real-time prediction. Furthermore, SAG introduces a one-for-all reusing strategy that reuses activations across both timesteps and blocks in a zig-zag manner, minimizing the global redundancy. Extensive experiments on multiple robotic benchmarks demonstrate that SAG achieves up to 4$\times$ generation speedup without sacrificing performance. Project Page: https://sparse-actiongen.github.io/.
翻译:扩散策略因其强大的多模态动作分布建模能力,在动作生成领域占据主导地位,但其多步去噪过程使其难以应用于实时视觉运动控制。现有的基于缓存的加速方法通常依赖于$\textit{静态}$调度,无法适应机器人-环境交互的$\textit{动态}$特性,从而导致次优性能。本文提出$\underline{\textbf{S}}$parse $\underline{\textbf{A}}$ction$\underline{\textbf{G}}$en ($\textbf{SAG}$)用于极稀疏的动作生成。为适应迭代交互,SAG定制了一种滚动自适应的"先剪枝后重用"机制:首先全局识别可剪枝的计算,然后在动作扩散过程中重用缓存的激活值来替代它们。为捕捉滚动动态,SAG参数化了一个观测条件扩散剪枝器以实现环境感知的自适应,并通过高度参数高效和推理高效的设计进行实例化,以实现实时预测。此外,SAG引入了一种"一对多"重用策略,以锯齿形方式跨时间步和模块重用激活值,从而最小化全局冗余。在多个机器人基准测试上的大量实验表明,SAG在不牺牲性能的情况下实现了高达4$\times$的生成加速。项目页面:https://sparse-actiongen.github.io/。