马铃薯块茎在收获机上的高通量三维形状补全 (High-throughput 3D shape completion of potato tubers on a harvester)

Potato yield is an important metric for farmers to further optimize their cultivation practices. Potato yield can be estimated on a harvester using an RGB-D camera that can estimate the three-dimensional (3D) volume of individual potato tubers. A challenge, however, is that the 3D shape derived from RGB-D images is only partially completed, underestimating the actual volume. To address this issue, we developed a 3D shape completion network, called CoRe++, which can complete the 3D shape from RGB-D images. CoRe++ is a deep learning network that consists of a convolutional encoder and a decoder. The encoder compresses RGB-D images into latent vectors that are used by the decoder to complete the 3D shape using the deep signed distance field network (DeepSDF). To evaluate our CoRe++ network, we collected partial and complete 3D point clouds of 339 potato tubers on an operational harvester in Japan. On the 1425 RGB-D images in the test set (representing 51 unique potato tubers), our network achieved a completion accuracy of 2.8 mm on average. For volumetric estimation, the root mean squared error (RMSE) was 22.6 ml, and this was better than the RMSE of the linear regression (31.1 ml) and the base model (36.9 ml). We found that the RMSE can be further reduced to 18.2 ml when performing the 3D shape completion in the center of the RGB-D image. With an average 3D shape completion time of 10 milliseconds per tuber, we can conclude that CoRe++ is both fast and accurate enough to be implemented on an operational harvester for high-throughput potato yield estimation. CoRe++'s high-throughput and accurate processing allows it to be applied to other tuber, fruit and vegetable crops, thereby enabling versatile, accurate and real-time yield monitoring in precision agriculture. Our code, network weights and dataset are publicly available at https://github.com/UTokyo-FieldPhenomics-Lab/corepp.git.

翻译：马铃薯产量是农民进一步优化种植实践的重要指标。利用RGB-D相机可在收获机上估算单个马铃薯块茎的三维体积，从而实现产量估计。然而，挑战在于从RGB-D图像获得的三维形状仅部分完整，会低估实际体积。为解决此问题，我们开发了一种名为CoRe++的三维形状补全网络，能够基于RGB-D图像补全三维形状。CoRe++是一种深度学习网络，由卷积编码器和解码器组成。编码器将RGB-D图像压缩为潜在向量，解码器则利用深度符号距离场网络（DeepSDF）补全三维形状。为评估CoRe++网络，我们在日本一台作业收获机上采集了339个马铃薯块茎的部分及完整三维点云数据。在测试集的1425张RGB-D图像（代表51个独立马铃薯块茎）上，我们的网络实现了平均2.8毫米的补全精度。体积估算的均方根误差（RMSE）为22.6毫升，优于线性回归（31.1毫升）和基准模型（36.9毫升）的RMSE。研究发现，当在RGB-D图像中心区域进行三维形状补全时，RMSE可进一步降低至18.2毫升。每个块茎平均10毫秒的三维形状补全时间表明，CoRe++兼具快速性与准确性，足以在作业收获机上实现高通量马铃薯产量估算。该网络的高通量与精确处理能力使其可扩展应用于其他块茎、果蔬作物，从而在精准农业中实现多功能、高精度、实时的产量监测。我们的代码、网络权重及数据集已公开于https://github.com/UTokyo-FieldPhenomics-Lab/corepp.git。