Affordances are a fundamental concept in robotics since they relate available actions for an agent depending on its sensory-motor capabilities and the environment. We present a novel Bayesian deep network to detect affordances in images, at the same time that we quantify the distribution of the aleatoric and epistemic variance at the spatial level. We adapt the Mask-RCNN architecture to learn a probabilistic representation using Monte Carlo dropout. Our results outperform the state-of-the-art of deterministic networks. We attribute this improvement to a better probabilistic feature space representation on the encoder and the Bayesian variability induced at the mask generation, which adapts better to the object contours. We also introduce the new Probability-based Mask Quality measure that reveals the semantic and spatial differences on a probabilistic instance segmentation model. We modify the existing Probabilistic Detection Quality metric by comparing the binary masks rather than the predicted bounding boxes, achieving a finer-grained evaluation of the probabilistic segmentation. We find aleatoric variance in the contours of the objects due to the camera noise, while epistemic variance appears in visual challenging pixels.
翻译:可操作属性是机器人学中的基础概念,它关联了智能体基于自身感觉运动能力和环境可执行的动作。我们提出了一种新颖的贝叶斯深度网络,用于在图像中检测可操作属性,同时从空间层面量化偶然不确定性与认知不确定性的分布。通过采用蒙特卡洛丢弃法,我们对Mask-RCNN架构进行改进,使其能够学习概率化表征。我们的结果超越了确定性网络的最先进水平。我们将这一改进归因于编码器中更优的概率化特征空间表征,以及掩膜生成过程中引入的贝叶斯变异性——该方法能更好地适应物体轮廓。我们提出了新的基于概率的掩膜质量度量,该度量可揭示概率化实例分割模型中的语义与空间差异。通过比较二进制掩膜而非预测边界框,我们对现有的概率化检测质量指标进行了改进,实现了对概率化分割的更细粒度评估。我们发现,由于相机噪声,物体轮廓处存在偶然不确定性,而视觉上具有挑战性的像素则呈现出认知不确定性。