Robotic strawberry harvesting requires precise 6D pose estimation; however, collecting 6D pose ground truth in real agricultural fields is inherently challenging. Existing strawberry 6D pose estimation studies have therefore relied mainly on synthetic data, often without sufficient scene-level realism,leaving their performance under real agricultural field conditions unquantified. In this work, we present, to the best of our knowledge, the first real-world 6D pose ground truth dataset of strawberries collected in actual agricultural fields (12,040 images). We also introduce a synthetic dataset rendered in NVIDIA Isaac Sim, featuring scene-level realism and domain randomization. Despite this improved simulation setup, our experiments reveal that a substantial sim-to-real gap persists, underscoring the necessity of real agricultural field data for reliable evaluation. We further quantify the sim-to-real gap through baseline 6D pose estimation results across backbone encoders, serving as a reference for future work.
翻译:机器人草莓采摘需要精确的6D位姿估计,然而在实际农业田间环境中采集6D位姿真值数据本身极具挑战性。为此,现有的草莓6D位姿估计研究主要依赖合成数据,但往往缺乏场景级真实感,导致其在真实农业田间条件下的性能无法量化。据我们所知,本工作首次提出了在真实农业田间采集的草莓6D位姿真值数据集(含12,040张图像),并同步发布了在NVIDIA Isaac Sim中渲染的合成数据集,该数据采用场景级真实感渲染与域随机化技术。尽管改进了仿真设置,实验结果表明显著的仿真-现实差距依然存在,这印证了真实农业田间数据对可靠评估的必要性。我们进一步通过骨干编码器的6D位姿估计基线结果量化了仿真-现实差距,为后续研究提供参考基准。