Transparent object perception is a crucial skill for applications such as robot manipulation in household and laboratory settings. Existing methods utilize RGB-D or stereo inputs to handle a subset of perception tasks including depth and pose estimation. However, transparent object perception remains to be an open problem. In this paper, we forgo the unreliable depth map from RGB-D sensors and extend the stereo based method. Our proposed method, MVTrans, is an end-to-end multi-view architecture with multiple perception capabilities, including depth estimation, segmentation, and pose estimation. Additionally, we establish a novel procedural photo-realistic dataset generation pipeline and create a large-scale transparent object detection dataset, Syn-TODD, which is suitable for training networks with all three modalities, RGB-D, stereo and multi-view RGB. Project Site: https://ac-rad.github.io/MVTrans/
翻译:透明物体感知是家庭和实验室环境中机器人操作等应用的关键技能。现有方法利用RGB-D或立体输入来处理包括深度和姿态估计在内的一部分感知任务。然而,透明物体感知仍然是一个未解决的问题。本文放弃了RGB-D传感器不可靠的深度图,并扩展了基于立体视觉的方法。我们提出的方法MVTrans是一种端到端的多视图架构,具备多种感知能力,包括深度估计、分割和姿态估计。此外,我们建立了一种新颖的程序化照片级真实感数据集生成流程,并创建了一个大规模透明物体检测数据集Syn-TODD,该数据集适用于训练同时利用RGB-D、立体视觉和多视图RGB三种模态的网络。项目网站:https://ac-rad.github.io/MVTrans/