Uncertainty in perception, actuation, and the environment often require multiple attempts for a robotic task to be successful. We study a class of problems providing (1) low-entropy indicators of terminal success / failure, and (2) unreliable (high-entropy) data to predict the final outcome of an ongoing task. Examples include a robot trying to connect with a charging station, parallel parking, or assembling a tightly-fitting part. The ability to restart after predicting failure early, versus simply running to failure, can significantly decrease the makespan, that is, the total time to completion, with the drawback of potentially short-cutting an otherwise successful operation. Assuming task running times to be Poisson distributed, and using a Markov Jump process to capture the dynamics of the underlying Markov Decision Process, we derive a closed form solution that predicts makespan based on the confusion matrix of the failure predictor. This allows the robot to learn failure prediction in a production environment, and only adopt a preemptive policy when it actually saves time. We demonstrate this approach using a robotic peg-in-hole assembly problem using a real robotic system. Failures are predicted by a dilated convolutional network based on force-torque data, showing an average makespan reduction from 101s to 81s (N=120, p<0.05). We posit that the proposed algorithm generalizes to any robotic behavior with an unambiguous terminal reward, with wide ranging applications on how robots can learn and improve their behaviors in the wild.
翻译:感知、执行与环境中的不确定性常导致机器人任务需多次尝试方能成功。本研究针对一类问题展开分析,该类问题具备以下特征:(1) 提供终端成功/失败的低熵指示信号;(2) 提供不可靠(高熵)数据以预测进行中任务的最终结果。典型场景包括机器人对接充电站、平行泊车、或装配过盈配合部件。相较于直接运行至失败,通过早期预测失败后重新启动的能力可显著缩短完成时间(即总完工时间),但代价为可能中断本可成功的操作。假设任务运行时间服从泊松分布,并利用马尔可夫跳跃过程刻画底层马尔可夫决策过程的动态特性,我们推导出基于失败预测器混淆矩阵预测完工时间的闭合解。这使得机器人可在生产环境中学习失败预测,并仅在节省时间时采用抢占式策略。我们通过真实机器人系统中的轴孔装配问题验证该方法——基于力-扭矩数据的膨胀卷积网络预测失败,平均完工时间从101秒降至81秒(N=120,p<0.05)。我们提出该算法可泛化至任何具有明确终端奖励的机器人行为,为机器人如何在现实场景中学习并改进行为提供广泛应用前景。