基于3D高斯溅射增强5D苹果姿态估计的标注 (Enhancing annotations for 5D apple pose estimation through 3D Gaussian Splatting (3DGS))

Automating tasks in orchards is challenging because of the large amount of variation in the environment and occlusions. One of the challenges is apple pose estimation, where key points, such as the calyx, are often occluded. Recently developed pose estimation methods no longer rely on these key points, but still require them for annotations, making annotating challenging and time-consuming. Due to the abovementioned occlusions, there can be conflicting and missing annotations of the same fruit between different images. Novel 3D reconstruction methods can be used to simplify annotating and enlarge datasets. We propose a novel pipeline consisting of 3D Gaussian Splatting to reconstruct an orchard scene, simplified annotations, automated projection of the annotations to images, and the training and evaluation of a pose estimation method. Using our pipeline, 105 manual annotations were required to obtain 28,191 training labels, a reduction of 99.6%. Experimental results indicated that training with labels of fruits that are $\leq95\%$ occluded resulted in the best performance, with a neutral F1 score of 0.927 on the original images and 0.970 on the rendered images. Adjusting the size of the training dataset had small effects on the model performance in terms of F1 score and pose estimation accuracy. It was found that the least occluded fruits had the best position estimation, which worsened as the fruits became more occluded. It was also found that the tested pose estimation method was unable to correctly learn the orientation estimation of apples.

翻译：果园环境中的自动化任务因环境变化大和遮挡严重而具有挑战性。苹果姿态估计便是挑战之一，其关键点（如花萼）常被遮挡。近期开发的姿态估计方法虽不再依赖这些关键点，但仍需其进行标注，导致标注过程既困难又耗时。由于上述遮挡问题，同一果实在不同图像间可能存在冲突或缺失的标注。新颖的三维重建方法可用于简化标注并扩充数据集。我们提出一种新颖流程，该流程包含：利用3D高斯溅射重建果园场景、简化标注、自动将标注投影至图像，以及姿态估计方法的训练与评估。使用本流程，仅需105次人工标注即可获得28,191个训练标签，标注量减少99.6%。实验结果表明，使用遮挡程度≤95%的果实标签进行训练可获得最佳性能，在原始图像上的中性F1分数为0.927，在渲染图像上为0.970。调整训练数据集规模对模型在F1分数和姿态估计精度方面的性能影响较小。研究发现，遮挡最少的果实位置估计效果最佳，随着遮挡程度增加，估计效果逐渐变差。同时发现，所测试的姿态估计方法未能正确学习苹果的方向估计。