Category-level Shape Estimation for Densely Cluttered Objects

Accurately estimating the shape of objects in dense clutters makes important contribution to robotic packing, because the optimal object arrangement requires the robot planner to acquire shape information of all existed objects. However, the objects for packing are usually piled in dense clutters with severe occlusion, and the object shape varies significantly across different instances for the same category. They respectively cause large object segmentation errors and inaccurate shape recovery on unseen instances, which both degrade the performance of shape estimation during deployment. In this paper, we propose a category-level shape estimation method for densely cluttered objects. Our framework partitions each object in the clutter via the multi-view visual information fusion to achieve high segmentation accuracy, and the instance shape is recovered by deforming the category templates with diverse geometric transformations to obtain strengthened generalization ability. Specifically, we first collect the multi-view RGB-D images of the object clutters for point cloud reconstruction. Then we fuse the feature maps representing the visual information of multi-view RGB images and the pixel affinity learned from the clutter point cloud, where the acquired instance segmentation masks of multi-view RGB images are projected to partition the clutter point cloud. Finally, the instance geometry information is obtained from the partially observed instance point cloud and the corresponding category template, and the deformation parameters regarding the template are predicted for shape estimation. Experiments in the simulated environment and real world show that our method achieves high shape estimation accuracy for densely cluttered everyday objects with various shapes.

翻译：在密集杂乱环境中准确估计物体形状对机器人装箱任务具有重要意义，因为最优的物体排布要求机器人规划器获取所有现存物体的形状信息。然而，待装箱物体通常以密集堆叠形式存在并伴有严重遮挡，且同类不同实例的物体形状存在显著差异，这分别导致在未见实例上产生较大物体分割误差与不准确形状恢复，进而降低部署过程中形状估计的性能。本文提出一种面向密集杂乱物体的类别级形状估计方法。我们的框架通过多视角视觉信息融合实现杂乱场景中各物体的高精度分割，并通过多种几何变换对类别模板进行变形来恢复实例形状，从而获得增强的泛化能力。具体而言，我们首先采集物体杂乱场景的多视角RGB-D图像进行点云重建。然后融合多视角RGB图像视觉信息表征的特征图与从杂乱点云中学习到的像素亲和度，将所获取的多视角RGB图像实例分割掩码投影至杂乱场景点云完成分割。最后，从局部观测实例点云与对应类别模板中提取实例几何信息，预测关于模板的形变参数以实现形状估计。在仿真环境及真实世界中的实验表明，本方法能够对具有多样形状的密集杂乱日常物体实现高精度形状估计。