Contact variability, sensing uncertainty, and external disturbances make grasp execution stochastic. Expected-quality objectives ignore tail outcomes and often select grasps that fail under adverse contact realizations. Risk-sensitive POMDPs address this failure mode, but many use particle-filter beliefs that scale poorly, obstruct gradient-based optimization, and estimate Conditional Value-at-Risk (CVaR) with high-variance approximations. We instead formulate grasp acquisition as variational inference over latent contact parameters and object pose, representing the belief with a differentiable Gaussian mixture. We use Gumbel-Softmax component selection and location-scale reparameterization to express samples as smooth functions of the belief parameters, enabling pathwise gradients through a differentiable CVaR surrogate for direct optimization of tail robustness. In simulation, our variational neural belief improves robust grasp success under contact-parameter uncertainty and exogenous force perturbations while reducing planning time by roughly an order of magnitude relative to particle-filter model-predictive control. On a serial-chain robot arm with a multifingered hand, we validate grasp-and-lift success under object-pose uncertainty against a Gaussian baseline. Both methods succeed on the tested perturbations, but our controller terminates in fewer steps and less wall-clock time while achieving a higher tactile grasp-quality proxy. Our learned belief also calibrates risk more accurately, keeping mean absolute calibration error below 0.14 across tested simulation regimes, compared with 0.58 for a Cross-Entropy Method planner.
翻译:接触变异性、感知不确定性及外部扰动导致抓取执行具有随机性。期望质量目标忽略了尾部结果,常选择在不利接触实现下失败的抓取方案。风险敏感型部分可观测马尔可夫决策过程(POMDP)虽能应对此类失效模式,但多数采用粒子滤波信念,其扩展性差、阻碍梯度优化,且估计条件风险价值(CVaR)时存在高方差近似。我们转而将抓取获取问题建模为对潜在接触参数与物体位姿的变分推断,采用可微高斯混合分布表征信念。通过Gumbel-Softmax成分选择与位置-尺度重参数化,将样本表达为信念参数的平滑函数,从而利用可微CVaR替代函数实现路径梯度直接优化尾部鲁棒性。仿真实验表明,在接触参数不确定性与外生力扰动条件下,所提变分神经信念提升了鲁棒抓取成功率,同时将规划时间相比粒子滤波模型预测控制降低约一个数量级。在多指手串联机械臂平台上,我们验证了物体位姿不确定性下的抓举成功率(相较于高斯基线方法)。两者在测试扰动下均能成功执行,但本控制器在更少步数与更短时钟时间内完成抓取,且触觉抓取质量代理指标更高。此外,学习得到的信念能够更精确地校准风险,在所有测试仿真场景下平均校准绝对误差低于0.14,而交叉熵方法规划器为0.58。