The adoption of fisheye cameras in robotic manipulation, driven by their exceptionally wide Field of View (FoV), is rapidly outpacing a systematic understanding of their downstream effects on policy learning. This paper presents the first comprehensive empirical study to bridge this gap, rigorously analyzing the properties of wrist-mounted fisheye cameras for imitation learning. Through extensive experiments in both simulation and the real world, we investigate three critical research questions: spatial localization, scene generalization, and hardware generalization. Our investigation reveals that: (1) The wide FoV significantly enhances spatial localization, but this benefit is critically contingent on the visual complexity of the environment. (2) Fisheye-trained policies, while prone to overfitting in simple scenes, unlock superior scene generalization when trained with sufficient environmental diversity. (3) While naive cross-camera transfer leads to failures, we identify the root cause as scale overfitting and demonstrate that hardware generalization performance can be improved with a simple Random Scale Augmentation (RSA) strategy. Collectively, our findings provide concrete, actionable guidance for the large-scale collection and effective use of fisheye datasets in robotic learning. More results and videos are available on https://robo-fisheye.github.io/
翻译:在机器人操作领域,鱼眼相机因其超宽视场而得到广泛采用,但其对策略学习下游影响的系统性理解却远远滞后。本文首次通过全面的实证研究来弥合这一差距,严谨分析了腕戴式鱼眼相机在模仿学习中的特性。通过在仿真和现实世界中的大量实验,我们探究了三个关键研究问题:空间定位、场景泛化与硬件泛化。研究发现:(1)宽视场显著提升了空间定位能力,但该优势高度依赖于环境的视觉复杂度。(2)鱼眼相机训练的策略虽然在简单场景中容易过拟合,但在获得足够环境多样性的训练后,能实现更优的场景泛化。(3)虽然简单的跨相机迁移会导致失败,但我们发现其根本原因在于尺度过拟合,并证明通过简单的随机尺度增强策略可提升硬件泛化性能。综合而言,我们的研究结果为机器人学习中鱼眼数据集的大规模采集与有效使用提供了具体可行的指导。更多结果与视频请访问 https://robo-fisheye.github.io/