Fine-grained grocery object recognition is an important computer vision problem with broad applications in automatic checkout, in-store robotic navigation, and assistive technologies for the visually impaired. Existing datasets on groceries are mainly 2D images. Models trained on these datasets are limited to learning features from the regular 2D grids. While portable 3D sensors such as Kinect were commonly available for mobile phones, sensors such as LiDAR and TrueDepth, have recently been integrated into mobile phones. Despite the availability of mobile 3D sensors, there are currently no dedicated real-world large-scale benchmark 3D datasets for grocery. In addition, existing 3D datasets lack fine-grained grocery categories and have limited training samples. Furthermore, collecting data by going around the object versus the traditional photo capture makes data collection cumbersome. Thus, we introduce a large-scale grocery dataset called 3DGrocery100. It constitutes 100 classes, with a total of 87,898 3D point clouds created from 10,755 RGB-D single-view images. We benchmark our dataset on six recent state-of-the-art 3D point cloud classification models. Additionally, we also benchmark the dataset on few-shot and continual learning point cloud classification tasks. Project Page: https://bigdatavision.org/3DGrocery100/.
翻译:细粒度杂货物体识别是计算机视觉领域的重要问题,在自动结账、店内机器人导航以及视觉障碍辅助技术中具有广泛应用。现有杂货数据集主要为二维图像,基于这些数据集训练的模型仅能从常规二维网格中学习特征。尽管Kinect等便携式三维传感器曾广泛用于手机,但LiDAR和TrueDepth等传感器近期才被集成到移动设备中。然而,尽管移动三维传感器已普及,目前仍缺乏面向杂货领域的专用真实大规模基准三维数据集。此外,现有三维数据集缺少细粒度杂货类别,且训练样本有限。同时,环绕物体采集数据的方式相比传统拍照采集更为繁琐。为此,我们提出名为3DGrocery100的大规模杂货数据集。该数据集包含100个类别,总计由10,755张RGB-D单视角图像生成的87,898个三维点云。我们基于六种最新三维点云分类模型对该数据集进行基准测试,同时还在小样本学习和持续学习点云分类任务上进行了评估。项目页面:https://bigdatavision.org/3DGrocery100/。