Adaptive robotics plays an essential role in achieving truly co-creative cyber physical systems. In robotic manipulation tasks, one of the biggest challenges is to estimate the pose of given workpieces. Even though the recent deep-learning-based models show promising results, they require an immense dataset for training. In this paper, two vision-based, multi-object grasp pose estimation models (MOGPE), the MOGPE Real-Time and the MOGPE High-Precision are proposed. Furthermore, a sim2real method based on domain randomization to diminish the reality gap and overcome the data shortage. Our methods yielded an 80% and a 96.67% success rate in a real-world robotic pick-and-place experiment, with the MOGPE Real-Time and the MOGPE High-Precision model respectively. Our framework provides an industrial tool for fast data generation and model training and requires minimal domain-specific data.
翻译:自适应机器人在实现真正协同共创的物理信息系统中扮演着关键角色。在机器人操作任务中,最大的挑战之一在于估计给定工件的姿态。尽管近期基于深度学习的模型展现出令人鼓舞的结果,但这类方法需要海量数据集进行训练。本文提出两种基于视觉的多目标抓取姿态估计模型——MOGPE实时型与MOGPE高精度型。此外,我们提出一种基于域随机化的Sim2Real方法,以缩小现实差距并克服数据短缺问题。在真实场景的机器人抓取-放置实验中,MOGPE实时型与MOGPE高精度型模型分别实现了80%与96.67%的成功率。本框架为快速数据生成与模型训练提供了工业化工具,且仅需极少的领域特定数据。