ROBOGATE: Adaptive Failure Discovery for Safe Robot Policy Deployment via Two-Stage Boundary-Focused Sampling

Deploying learned robot manipulation policies in industrial settings requires rigorous pre-deployment validation, yet exhaustive testing across high-dimensional parameter spaces is intractable. We present ROBOGATE, a deployment risk management framework that combines physics-based simulation with a two-stage adaptive sampling strategy to efficiently discover failure boundaries in the operational parameter space. Stage 1 employs Latin Hypercube Sampling (LHS) across an 8-dimensional parameter space to establish a coarse failure landscape from 20,000 uniformly distributed experiments. Stage 2 applies boundary-focused sampling that concentrates 10,000 additional experiments in the 30-70% success rate transition zone, enabling precise failure boundary mapping. Using NVIDIA Isaac Sim with Newton physics, we evaluate a scripted pick-and-place controller on two robot embodiments -- Franka Panda (7-DOF) and UR5e (6-DOF) -- across 30,000 total experiments. Our logistic regression risk model achieves an AUC of 0.780 on the combined dataset (vs. 0.754 for Stage 1 alone), identifies a closed-form failure boundary equation, and reveals four universal danger zones affecting both robot platforms. We further demonstrate the framework on VLA (Vision-Language-Action) model evaluation, where Octo-Small achieves 0.0% success rate on 68 adversarial scenarios versus 100% for the scripted baseline -- a 100-point gap that underscores the challenge of deploying foundation models in industrial settings. ROBOGATE is open-source and runs on a single GPU workstation.

翻译：在工业环境中部署经过学习的机器人操作策略需要严格的部署前验证，然而在高维参数空间中进行穷举测试是不可行的。我们提出ROBOGATE，一种部署风险管理框架，它结合基于物理的仿真与两阶段自适应采样策略，以高效发现操作参数空间中的故障边界。第一阶段采用拉丁超立方采样（LHS）在8维参数空间中进行20000次均匀分布实验，建立粗粒度的故障分布图。第二阶段采用边界聚焦采样，额外进行10000次实验，集中在成功率为30-70%的过渡区域，从而实现精确的故障边界映射。利用搭载Newton物理引擎的NVIDIA Isaac Sim，我们对两种机器人实体——Franka Panda（7自由度）和UR5e（6自由度）上的脚本化抓取-放置控制器进行了总计30000次实验评估。我们的逻辑回归风险模型在合并数据集上的AUC达到0.780（对比仅第一阶段的0.754），识别出闭合形式的故障边界方程，并揭示了影响两种机器人平台的四个通用危险区域。我们进一步将框架应用于VLA（视觉-语言-动作）模型评估，其中Octo-Small在68个对抗场景中成功率为0.0%，而脚本化基线为100%——这一100个百分点的差距凸显了在工业环境中部署基础模型所面临的挑战。ROBOGATE为开源项目，可在单GPU工作站上运行。