While existing equivariant methods enhance data efficiency, they suffer from high computational intensity, reliance on single-modality inputs, and instability when combined with fast-sampling methods. In this work, we propose E3Flow, a novel framework that addresses the critical limitations of equivariant diffusion policies. E3Flow overcomes these challenges, successfully unifying efficient rectified flow with stable, multi-modal equivariant learning for the first time. Our framework is built upon spherical harmonic representations to ensure rigorous SO(3) equivariance. We introduce a novel invariant Feature Enhancement Module (FEM) that dynamically fuses hybrid visual modalities (point clouds and images), injecting rich visual cues into the spherical harmonic features. We evaluate E3Flow on 8 manipulation tasks from the MimicGen and further conduct 4 real-world experiments to validate its effectiveness in physical environments. Simulation results show that E3Flow achieves a 3.12% improvement in average success rate over the state-of-the-art Spherical Diffusion Policy (SDP) while simultaneously delivering a 7x inference speedup. E3Flow thus demonstrates a new and highly effective trade-off between performance, efficiency, and data efficiency for robotic policy learning. Code: https://github.com/zql-kk/E3Flow.
翻译:尽管现有等变方法提升了数据效率,但其存在计算强度高、依赖单模态输入以及与快速采样方法结合时稳定性差等问题。本文提出E3Flow框架,该创新框架解决了等变扩散策略的关键局限性。E3Flow首次成功将高效整流流与稳定的多模态等变学习统一起来,克服了上述挑战。该框架基于球谐函数表示构建,确保严格的SO(3)等变性。我们引入新型不变特征增强模块(FEM),该模块可动态融合混合视觉模态(点云和图像),将丰富视觉线索注入球谐特征。在MimicGen的8项操作任务中评估E3Flow,并进一步开展4项真实物理环境实验验证其有效性。仿真结果表明,E3Flow在平均成功率上较最先进的球谐扩散策略(SDP)提升3.12%,同时实现7倍推理加速。因此,E3Flow在机器人策略学习的性能、效率与数据效率之间展示了全新且高效的平衡。代码地址:https://github.com/zql-kk/E3Flow。