High-precision robotic manipulation requires fine-grained spatial reasoning that is often difficult to achieve with RGB-only policies due to depth ambiguity and perspective scale issues. Policies that leverage 3D information directly, such as those based on point clouds, offer a stronger geometric prior over purely image-based ones, yet their performance remains highly task-dependent. We hypothesize that this discrepancy may be due to the spectral bias of neural networks towards learning low frequency functions, which especially affects architectures conditioned on slow-moving Cartesian features. We thus propose to map point clouds from Cartesian space into high-dimensional Fourier space, effectively equipping the point cloud encoder with direct access to high-frequency features. We experimentally validate the use of Fourier features on challenging manipulation tasks from the RoboCasa and ManiSkill3 benchmarks and on a real robot setup. Despite their simplicity, we find that Fourier features provide significant benefits across diverse encoder architectures and benchmarks and are robust across hyperparameters. Our results indicate that Fourier features let policies leverage geometric details more effectively than Cartesian features, showing their potential as a general-purpose tool for point cloud-based imitation learning. We provide source code and videos on our project page: https://fourier-il.github.io/fourier-il
翻译:高精度机器人操控需要精细的空间推理能力,而仅依赖RGB图像的策略常因深度模糊性和透视尺度问题难以满足需求。直接利用三维信息的策略(如基于点云的策略)相比纯图像方法具有更强的几何先验,但其性能仍高度依赖于具体任务。我们推测这种差异可能源于神经网络倾向于学习低频函数的谱偏差,这种偏差对基于慢速笛卡尔特征的架构影响尤为显著。为此,我们提出将点云从笛卡尔空间映射到高维傅里叶空间,从而有效赋予点云编码器直接获取高频特征的能力。我们在RoboCasa和ManiSkill3基准测试中的高难度操控任务以及真实机器人平台上实验验证了傅里叶特征的有效性。尽管方法简单,我们发现傅里叶特征在多种编码器架构与基准测试中均能带来显著性能提升,且对超参数具有鲁棒性。结果表明,相比笛卡尔特征,傅里叶特征能使策略更高效地利用几何细节,展现了其作为基于点云的模仿学习通用工具的潜力。源代码与演示视频已发布于项目主页:https://fourier-il.github.io/fourier-il