面向抓取规划的现代视觉关系推理方法 (A Modern Take on Visual Relationship Reasoning for Grasp Planning)

Interacting with real-world cluttered scenes pose several challenges to robotic agents that need to understand complex spatial dependencies among the observed objects to determine optimal pick sequences or efficient object retrieval strategies. Existing solutions typically manage simplified scenarios and focus on predicting pairwise object relationships following an initial object detection phase, but often overlook the global context or struggle with handling redundant and missing object relations. In this work, we present a modern take on visual relational reasoning for grasp planning. We introduce D3GD, a novel testbed that includes bin picking scenes with up to 35 objects from 97 distinct categories. Additionally, we propose D3G, a new end-to-end transformer-based dependency graph generation model that simultaneously detects objects and produces an adjacency matrix representing their spatial relationships. Recognizing the limitations of standard metrics, we employ the Average Precision of Relationships for the first time to evaluate model performance, conducting an extensive experimental benchmark. The obtained results establish our approach as the new state-of-the-art for this task, laying the foundation for future research in robotic manipulation. We publicly release the code and dataset at https://paolotron.github.io/d3g.github.io.

翻译：与现实世界杂乱场景交互对机器人智能体提出了多重挑战，这些智能体需要理解观测物体间复杂的空间依赖关系，以确定最优抓取序列或高效物体检索策略。现有解决方案通常处理简化场景，并在初始物体检测阶段后专注于预测成对物体关系，但往往忽略全局上下文，或难以处理冗余及缺失的物体关系。本研究提出一种面向抓取规划的现代视觉关系推理方法。我们引入D3GD——一个包含多达97个不同类别、35个物体的箱式分拣场景的新型测试平台。此外，我们提出D3G模型，这是一种基于Transformer的端到端依赖图生成模型，能够同步检测物体并生成表征其空间关系的邻接矩阵。针对标准评估指标的局限性，我们首次采用关系平均精度（Average Precision of Relationships）评估模型性能，并进行了全面的实验基准测试。所得结果表明我们的方法在该任务中达到了新的最优水平，为机器人操控领域的未来研究奠定了基础。代码与数据集已公开发布于https://paolotron.github.io/d3g.github.io。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Nat. Biotechnol. | 机器学习为生物库驱动的药物发现提供动力

专知会员服务

11+阅读 · 2022年9月12日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日