Transparent objects are widely used in our daily lives, making it important to teach robots to interact with them. However, it's not easy because the reflective and refractive effects can make depth cameras fail to give accurate geometry measurements. To solve this problem, this paper introduces RFTrans, an RGB-D-based method for surface normal estimation and manipulation of transparent objects. By leveraging refractive flow as an intermediate representation, the proposed method circumvents the drawbacks of directly predicting the geometry (e.g. surface normal) from images and helps bridge the sim-to-real gap. It integrates the RFNet, which predicts refractive flow, object mask, and boundaries, followed by the F2Net, which estimates surface normal from the refractive flow. To make manipulation possible, a global optimization module will take in the predictions, refine the raw depth, and construct the point cloud with normal. An off-the-shelf analytical grasp planning algorithm is followed to generate the grasp poses. We build a synthetic dataset with physically plausible ray-tracing rendering techniques to train the networks. Results show that the proposed method trained on the synthetic dataset can consistently outperform the baseline method in both synthetic and real-world benchmarks by a large margin. Finally, a real-world robot grasping task witnesses an 83% success rate, proving that refractive flow can help enable direct sim-to-real transfer. The code, data, and supplementary materials are available at https://rftrans.robotflow.ai.
翻译:透明物体在日常生活中广泛使用,因此教机器人与其交互至关重要。然而,由于反射和折射效应会导致深度相机无法获取精确的几何测量值,这一问题极具挑战性。为解决该难题,本文提出RFTrans——一种基于RGB-D的透明物体表面法向估计与操控方法。通过将折射流作为中间表征,该方法规避了直接根据图像预测几何信息(如表面法向)的缺陷,并有助于弥合仿真到现实的鸿沟。其整合了预测折射流、物体掩膜及边界的RFNet,以及后续基于折射流估计表面法向的F2Net。为实现操控,全局优化模块接收预测结果,对原始深度进行精炼,并构建含法向量的点云。随后采用现成的分析式抓取规划算法生成抓取姿态。我们构建了基于物理合理光线追踪渲染技术的合成数据集用于网络训练。结果表明,该方法在合成数据集上训练后,在合成与现实基准测试中均以显著优势持续超越基线方法。最终,真实机器人抓取任务实现了83%的成功率,证实折射流可助力实现直接的仿真到现实迁移。代码、数据及补充材料详见https://rftrans.robotflow.ai。