Searching for objects in cluttered environments requires selecting efficient viewpoints and manipulation actions to remove occlusions and reduce uncertainty in object locations, shapes, and categories. In this work, we address the problem of manipulation-enhanced semantic mapping, where a robot has to efficiently identify all objects in a cluttered shelf. Although Partially Observable Markov Decision Processes~(POMDPs) are standard for decision-making under uncertainty, representing unstructured interactive worlds remains challenging in this formalism. To tackle this, we define a POMDP whose belief is summarized by a metric-semantic grid map and propose a novel framework that uses neural networks to perform map-space belief updates to reason efficiently and simultaneously about object geometries, locations, categories, occlusions, and manipulation physics. Further, to enable accurate information gain analysis, the learned belief updates should maintain calibrated estimates of uncertainty. Therefore, we propose Calibrated Neural-Accelerated Belief Updates (CNABUs) to learn a belief propagation model that generalizes to novel scenarios and provides confidence-calibrated predictions for unknown areas. Our experiments show that our novel POMDP planner improves map completeness and accuracy over existing methods in challenging simulations and successfully transfers to real-world cluttered shelves in zero-shot fashion.
翻译:在杂乱环境中搜索物体需要选择高效的视点与操作动作,以消除遮挡并降低物体位置、形状及类别的不确定性。本研究针对操作增强型语义建图问题展开探索,要求机器人在杂乱的货架上高效识别所有物体。尽管部分可观测马尔可夫决策过程(POMDP)是处理不确定性决策的标准框架,但在此形式化体系中表征非结构化的交互世界仍具挑战性。为此,我们定义了一个以度量-语义网格地图作为信念摘要的POMDP,并提出一种创新框架:该框架利用神经网络执行映射空间信念更新,从而同步高效地推理物体几何形态、空间位置、类别归属、遮挡关系及操作物理特性。此外,为实现精确的信息增益分析,习得的信念更新需保持经过校准的不确定性估计。因此,我们提出校准神经加速信念更新方法,通过学习信念传播模型来泛化至新场景,并为未知区域提供置信度校准的预测。实验表明,在具有挑战性的仿真环境中,我们提出的新型POMDP规划器相较于现有方法显著提升了地图完整度与精度,并能以零样本方式成功迁移至真实世界的杂乱货架场景。