Multi-object grasping is a challenging task. It is important for energy and cost-efficient operation of industrial crane manipulators, such as those used to collect tree logs off the forest floor and onto forest machines. In this work, we used synthetic data from physics simulations to explore how data-driven modeling can be used to infer multi-object grasp poses from images. We showed that convolutional neural networks can be trained specifically for synthesizing multi-object grasps. Using RGB-Depth images and instance segmentation masks as input, a U-Net model outputs grasp maps with corresponding grapple orientation and opening width. Given an observation of a pile of logs, the model can be used to synthesize and rate the possible grasp poses and select the most suitable one, with the possibility to respect changing operational constraints such as lift capacity and reach. When tested on previously unseen data, the proposed model found successful grasp poses with an accuracy of 95%.
翻译:多物体抓取是一项具有挑战性的任务。它对工业起重机操纵器(如用于从森林地面收集原木并装载到林业机械上的操纵器)的节能高效运行至关重要。在本研究中,我们利用物理仿真生成的合成数据,探索如何通过数据驱动建模从图像中推断多物体抓取姿态。我们证明,卷积神经网络可以专门训练用于合成多物体抓取。通过使用RGB-Depth图像和实例分割掩码作为输入,U-Net模型输出带有对应抓斗朝向和开口宽度的抓取图。在观察原木堆后,该模型可用于合成并评估可能的抓取姿态并选择最合适的一个,同时能够考虑起重能力和工作半径等动态操作约束。在先前未见过的数据上进行测试时,所提出的模型以95%的准确率成功找到了抓取姿态。