KAN We Flow? Advancing Robotic Manipulation with 3D Flow Matching via KAN & RWKV

Diffusion-based visuomotor policies excel at modeling action distributions but are inference-inefficient, since recursively denoising from noise to policy requires many steps and heavy UNet backbones, which hinders deployment on resource-constrained robots. Flow matching alleviates the sampling burden by learning a one-step vector field, yet prior implementations still inherit large UNet-style architectures. In this work, we present KAN-We-Flow, a flow-matching policy that draws on recent advances in Receptance Weighted Key Value (RWKV) and Kolmogorov-Arnold Networks (KAN) from vision to build a lightweight and highly expressive backbone for 3D manipulation. Concretely, we introduce an RWKV-KAN block: an RWKV first performs efficient time/channel mixing to propagate task context, and a subsequent GroupKAN layer applies learnable spline-based, groupwise functional mappings to perform feature-wise nonlinear calibration of the action mapping on RWKV outputs. Moreover, we introduce an Action Consistency Regularization (ACR), a lightweight auxiliary loss that enforces alignment between predicted action trajectories and expert demonstrations via Euler extrapolation, providing additional supervision to stabilize training and improve policy precision. Without resorting to large UNets, our design reduces parameters by 86.8\%, maintains fast runtime, and achieves state-of-the-art success rates on Adroit, Meta-World, and DexArt benchmarks. Our project page can be viewed in \href{https://zhihaochen-2003.github.io/KAN-We-Flow.github.io/}{\textcolor{red}{link}}

翻译：基于扩散的视觉运动策略在建模动作分布方面表现出色，但其推理效率低下，因为从噪声到策略的递归去噪需要多步计算和沉重的UNet主干网络，这阻碍了其在资源受限机器人上的部署。流匹配通过学习一个单步向量场来减轻采样负担，然而先前的实现仍然继承了庞大的UNet风格架构。在本工作中，我们提出了KAN-We-Flow，一种流匹配策略，它借鉴了视觉领域中Receptance Weighted Key Value (RWKV) 和Kolmogorov-Arnold Networks (KAN) 的最新进展，构建了一个轻量级且高表达能力的3D操作主干网络。具体而言，我们引入了一个RWKV-KAN模块：RWKV首先执行高效的时序/通道混合以传播任务上下文，随后一个GroupKAN层应用基于可学习样条的、分组函数映射，对RWKV输出上的动作映射进行特征维度的非线性校准。此外，我们引入了动作一致性正则化（ACR），这是一种轻量级的辅助损失，通过欧拉外推法强制预测的动作轨迹与专家示范之间对齐，提供额外的监督以稳定训练并提高策略精度。在不依赖大型UNet的情况下，我们的设计将参数减少了86.8%，保持了快速的运行时，并在Adroit、Meta-World和DexArt基准测试中达到了最先进的成功率。我们的项目页面可在 \href{https://zhihaochen-2003.github.io/KAN-We-Flow.github.io/}{\textcolor{red}{链接}} 查看。