Liquid perception is critical for robotic pouring tasks. It usually requires the robust visual detection of flowing liquid. However, while recent works have shown promising results in liquid perception, they typically require labeled data for model training, a process that is both time-consuming and reliant on human labor. To this end, this paper proposes a simple yet effective framework PourIt!, to serve as a tool for robotic pouring tasks. We design a simple data collection pipeline that only needs image-level labels to reduce the reliance on tedious pixel-wise annotations. Then, a binary classification model is trained to generate Class Activation Map (CAM) that focuses on the visual difference between these two kinds of collected data, i.e., the existence of liquid drop or not. We also devise a feature contrast strategy to improve the quality of the CAM, thus entirely and tightly covering the actual liquid regions. Then, the container pose is further utilized to facilitate the 3D point cloud recovery of the detected liquid region. Finally, the liquid-to-container distance is calculated for visual closed-loop control of the physical robot. To validate the effectiveness of our proposed method, we also contribute a novel dataset for our task and name it PourIt! dataset. Extensive results on this dataset and physical Franka robot have shown the utility and effectiveness of our method in the robotic pouring tasks. Our dataset, code and pre-trained models will be available on the project page.
翻译:液体感知对于机器人倒水任务至关重要,通常需要对流动液体进行鲁棒的视觉检测。然而,尽管近期研究在液体感知领域取得了令人瞩目的成果,但这些方法通常依赖标注数据进行模型训练,这一过程既耗时又依赖人工。为此,本文提出一个简洁而有效的框架PourIt!,作为机器人倒水任务的工具。我们设计了一种仅需图像级标签的简易数据采集流程,以减轻对繁琐像素级标注的依赖。随后,训练一个二分类模型以生成类激活图(CAM),该图聚焦于两类采集数据(即是否存在液滴)之间的视觉差异。我们还设计了一种特征对比策略来提升CAM的质量,从而完整且紧密地覆盖实际液体区域。接着,进一步利用容器姿态来促进检测液体区域的三维点云恢复。最后,计算液体到容器的距离,用于物理机器人的视觉闭环控制。为验证所提方法的有效性,我们为该项任务贡献了一个新数据集,并命名为PourIt!数据集。在该数据集及物理Franka机器人上的大量实验结果表明,我们的方法在机器人倒水任务中具有实用性和有效性。我们的数据集、代码及预训练模型将发布在项目页面上。