Precise scene understanding is key for most robot monitoring and intervention tasks in agriculture. In this work we present PAg-NeRF which is a novel NeRF-based system that enables 3D panoptic scene understanding. Our representation is trained using an image sequence with noisy robot odometry poses and automatic panoptic predictions with inconsistent IDs between frames. Despite this noisy input, our system is able to output scene geometry, photo-realistic renders and 3D consistent panoptic representations with consistent instance IDs. We evaluate this novel system in a very challenging horticultural scenario and in doing so demonstrate an end-to-end trainable system that can make use of noisy robot poses rather than precise poses that have to be pre-calculated. Compared to a baseline approach the peak signal to noise ratio is improved from 21.34dB to 23.37dB while the panoptic quality improves from 56.65% to 70.08%. Furthermore, our approach is faster and can be tuned to improve inference time by more than a factor of 2 while being memory efficient with approximately 12 times fewer parameters.
翻译:精确的场景理解是农业中大多数机器人监测与干预任务的关键。本文提出PAg-NeRF,一种基于NeRF的新型系统,可实现三维全景场景理解。我们的表征通过含噪声的机器人里程计位姿的图像序列以及帧间ID不一致的自动全景预测进行训练。尽管输入存在噪声,该系统仍能输出场景几何结构、逼真渲染结果以及具有一致实例ID的三维一致全景表征。我们在极具挑战性的园艺场景中评估了这一新型系统,并由此证明了一种端到端可训练系统能够利用噪声机器人位姿(而非需预先计算的精确位姿)进行工作。与基准方法相比,峰值信噪比从21.34dB提升至23.37dB,全景质量从56.65%提升至70.08%。此外,我们的方法速度更快,可通过调优使推理时间提升超过2倍,同时参数数量减少约12倍,实现内存高效。