Spline Policy: A Structured Representation for Robot Policies

Modern imitation-learning policies for robot manipulation often represent actions as fixed-resolution action chunks, which are simple and effective but expose limited geometric and temporal structure before execution. This paper studies Spline Policy (SP), a structured representation that replaces action chunks with spline parameters while keeping the policy backbone unchanged. The predicted spline can be decoded as a compact continuous trajectory, queried at different temporal resolutions, constrained or edited in parameter space, and passed to downstream controllers. For quadratic spline outputs, the same representation can also be converted into a state-dependent vector field through an analytical distance-field construction. Under the regularity and projection assumptions of this construction, the induced dynamics do not increase the distance to the generated spline, yielding a principled local corrective mechanism around the predicted motion. The spline output further supports uncertainty propagation from observations to spline parameters, trajectories, and flow fields, and can be combined with classical control mechanisms such as null-space collision avoidance without retraining the policy backbone. We instantiate SP with diffusion, flow-matching, transformer-based, and vision-language-action backbones. Experiments in low-dimensional motion learning, simulated manipulation under matched backbones, dexterous manipulation, and real-robot case studies show that SP remains compatible with modern policy learners while exposing useful motion-structure properties, including compact decoding, temporal resampling, local correction around predicted motions, uncertainty evaluation, and controller compatibility.

翻译：现代机器人操作的模仿学习策略通常将动作表示为固定分辨率的动作块，这种方法简单有效，但在执行前暴露的几何和时间结构有限。本文研究了样条策略（SP），这是一种用样条参数替代动作块的结构化表示，同时保持策略骨干网络不变。预测的样条可解码为紧凑的连续轨迹，能够以不同的时间分辨率进行查询，在参数空间中施加约束或编辑，并传递给下游控制器。对于二次样条输出，相同的表示还可通过解析距离场构造转换为状态相关的向量场。在该构造的正则性和投影假设下，诱导的动力学不增加与生成样条之间的距离，从而在预测运动周围形成原理性的局部校正机制。样条输出进一步支持从观测到样条参数、轨迹及流场的不确定性传播，并可与传统控制机制（如零空间碰撞避免）结合，而无需重新训练策略骨干网络。我们使用扩散模型、流匹配、基于Transformer及视觉-语言-动作骨干网络实例化了SP。低维运动学习、匹配骨干网络下的仿真操作、灵巧操作及真实机器人案例研究中的实验表明，SP在保持与现代策略学习器兼容的同时，展现了有用的运动结构特性，包括紧凑解码、时间重采样、预测运动周围的局部校正、不确定性评估及控制器兼容性。