Tool use often fails not because robots misidentify tools, but because grasps cannot withstand task-induced wrench. Existing vision-language manipulation systems ground tools and contact regions from language yet select grasps under quasi-static or geometry-only assumptions. During interaction, inertial impulse and lever-arm amplification generate wrist torque and tangential loads that trigger slip and rotation. We introduce inverse Tool-use Planning (iTuP), which selects grasps by minimizing predicted interaction wrench along a task-conditioned trajectory. From rigid-body mechanics, we derive torque, slip, and alignment penalties, and train a Stable Dynamic Grasp Network (SDG-Net) to approximate these trajectory-conditioned costs for real-time scoring. Across hammering, sweeping, knocking, and reaching in simulation and on hardware, SDG-Net suppresses induced torque up to 17.6%, shifts grasps below empirically observed instability thresholds, and improves real-world success by 17.5% over a compositional baseline. Improvements concentrate where wrench amplification dominates, showing that robot tool use requires wrench-aware grasp selection, not perception alone.
翻译:工具使用失败往往并非源于机器人对工具的误识别,而是由于抓取姿态无法承受任务产生的载荷力矩。现有的视觉语言操控系统虽能通过语言信息定位工具及接触区域,但其抓取选择通常基于准静态或纯几何假设。在交互过程中,惯性冲量与力臂放大效应会产生腕部扭矩与切向载荷,进而引发滑动与转动。本文提出逆向工具使用规划方法,通过最小化沿任务条件轨迹的预测交互载荷力矩来选择抓取姿态。基于刚体力学原理,我们推导出扭矩、滑动与对中惩罚项,并训练稳定动态抓取网络以近似这些轨迹条件约束的代价函数,实现实时抓取评分。在模拟与实体硬件上的敲击、清扫、推碰及够取任务中,稳定动态抓取网络将诱导扭矩抑制达17.6%,将抓取位置调整至经验观测的失稳阈值以下,并在实际场景中较组合式基线方法提升17.5%的成功率。性能提升主要集中在载荷放大效应显著的区域,这表明机器人工具使用需要具备载荷感知的抓取选择能力,而非仅依赖感知模块。