Vision-guided robot grasping based on Deep Neural Networks (DNNs) generalizes well but poses safety risks in the Human-Robot Interaction (HRI). Recent works solved it by designing benign adversarial attacks and patches with RGB modality, yet depth-independent characteristics limit their effectiveness on RGBD modality. In this work, we propose the Multimodal Adversarial Quality Policy (MAQP) to realize multimodal safe grasping. Our framework introduces two key components. First, the Heterogeneous Dual-Patch Optimization Scheme (HDPOS) mitigates the distribution discrepancy between RGB and depth modalities in patch generation by adopting modality-specific initialization strategies, employing a Gaussian distribution for depth patches and a uniform distribution for RGB patches, while jointly optimizing both modalities under a unified objective function. Second, the Gradient-Level Modality Balancing Strategy (GLMBS) is designed to resolve the optimization imbalance from RGB and Depth patches in patch shape adaptation by reweighting gradient contributions based on per-channel sensitivity analysis and applying distance-adaptive perturbation bounds. We conduct extensive experiments on the benchmark datasets and a cobot, showing the effectiveness of MAQP.
翻译:基于深度神经网络(DNNs)的视觉引导机器人抓取具有良好的泛化能力,但在人机交互(HRI)中带来了安全风险。近期研究通过设计基于RGB模态的良性对抗攻击与对抗补丁来解决此问题,然而其深度无关的特性限制了它们在RGBD模态上的有效性。在本工作中,我们提出了多模态对抗质量策略(MAQP)以实现多模态安全抓取。我们的框架引入了两个关键组件。首先,异构双补丁优化方案(HDPOS)通过采用模态特定的初始化策略——对深度补丁采用高斯分布,对RGB补丁采用均匀分布——并在统一的目标函数下联合优化两种模态,从而缓解了补丁生成过程中RGB与深度模态间的分布差异。其次,梯度级模态平衡策略(GLMBS)旨在解决补丁形状适应过程中RGB与深度补丁带来的优化不平衡问题,其方法基于通道级敏感性分析重新加权梯度贡献,并应用距离自适应的扰动边界。我们在基准数据集和一个协作机器人上进行了大量实验,验证了MAQP的有效性。