Inspired by traditional handmade crafts, where a person improvises assemblies based on the available objects, we formally introduce the Craft Assembly Task. It is a robotic assembly task that involves building an accurate representation of a given target object using the available objects, which do not directly correspond to its parts. In this work, we focus on selecting the subset of available objects for the final craft, when the given input is an RGB image of the target in the wild. We use a mask segmentation neural network to identify visible parts, followed by retrieving labelled template meshes. These meshes undergo pose optimization to determine the most suitable template. Then, we propose to simplify the parts of the transformed template mesh to primitive shapes like cuboids or cylinders. Finally, we design a search algorithm to find correspondences in the scene based on local and global proportions. We develop baselines for comparison that consider all possible combinations, and choose the highest scoring combination for common metrics used in foreground maps and mask accuracy. Our approach achieves comparable results to the baselines for two different scenes, and we show qualitative results for an implementation in a real-world scenario.
翻译:受传统手工制作的启发,即人们根据可用物体即兴组装,我们正式提出了手工组装任务。这是一种机器人组装任务,涉及使用可用物体构建给定目标对象的精确表示,而这些可用物体并不直接对应于其组成部分。在本工作中,我们关注的是当给定输入为目标物体在自然场景下的RGB图像时,如何为最终的手工制品选择可用物体的子集。我们使用掩码分割神经网络来识别可见部分,随后检索带标签的模板网格。这些网格经过姿态优化以确定最合适的模板。接着,我们提出将变换后模板网格的各个部分简化为长方体或圆柱体等基本几何形状。最后,我们设计了一种基于局部和全局比例的搜索算法,以在场景中寻找对应关系。我们开发了用于比较的基线方法,这些方法考虑所有可能的组合,并针对前景图和掩码精度中常用的度量标准选择得分最高的组合。我们的方法在两个不同场景中取得了与基线方法相当的结果,并展示了在真实场景中实施的定性结果。