Robotic strawberry harvesting requires precise 6D pose estimation; however, collecting 6D pose ground truth in real agricultural fields is inherently challenging. Existing 6D pose estimation methods have therefore relied solely on synthetic data that lacks scene-level realism, leaving their performance under real agricultural field conditions unquantified. In this work, we present, to the best of our knowledge, the first real-world 6D pose ground truth dataset of strawberries collected in actual agricultural fields (12,040 images). We also introduce a synthetic dataset rendered in NVIDIA Isaac Sim, featuring scene-level realism and domain randomization. Nevertheless, our experiments reveal that a significant sim-to-real gap persists, underscoring the necessity of real agricultural field data for reliable evaluation. We further quantify the sim-to-real gap through baseline 6D pose estimation results across backbone encoders, serving as a reference for future work. The real-world dataset will be made available upon acceptance.
翻译:机器人草莓采摘需要精确的6D位姿估计,然而在真实农业田间收集6D位姿真值数据本身极具挑战性。现有6D位姿估计方法因此仅依赖缺乏场景级真实性的合成数据,其在真实农业田间条件下的性能尚未得到量化评估。本文提出——据我们所知——首个在真实农业田间采集的草莓6D位姿真值数据集(包含12,040张图像),同时引入在NVIDIA Isaac Sim中渲染的合成数据集,该数据集具备场景级真实性与域随机化特性。然而实验表明,显著的仿真-真实差距仍然存在,这凸显了可靠评估必须依赖真实农业田间数据。我们进一步通过主干编码器上的基线6D位姿估计结果量化该仿真-真实差距,为后续研究提供参考基准。该真实世界数据集将在论文被接收后公开发布。