Taking over arbitrary tasks like humans do with a mobile service robot in open-world settings requires a holistic scene perception for decision-making and high-level control. This paper presents a human-inspired scene perception model to minimize the gap between human and robotic capabilities. The approach takes over fundamental neuroscience concepts, such as a triplet perception split into recognition, knowledge representation, and knowledge interpretation. A recognition system splits the background and foreground to integrate exchangeable image-based object detectors and SLAM, a multi-layer knowledge base represents scene information in a hierarchical structure and offers interfaces for high-level control, and knowledge interpretation methods deploy spatio-temporal scene analysis and perceptual learning for self-adjustment. A single-setting ablation study is used to evaluate the impact of each component on the overall performance for a fetch-and-carry scenario in two simulated and one real-world environment.
翻译:像人类一样在开放世界中用移动服务机器人自主执行任意任务,需要一种整体性的场景感知能力来支撑决策与高层控制。本文提出一种受人类启发的场景感知模型,旨在缩小人类与机器人能力之间的差距。该方法借鉴基础神经科学概念,例如将三元感知拆分为识别、知识表示与知识解释。识别系统分离背景与前景,以整合可替换的基于图像的物体检测器与SLAM;多层知识库以层级结构表征场景信息,并为高层控制提供接口;知识解释方法则运用时空场景分析与感知学习实现自适应调节。通过单一设置的消融研究,在两个仿真环境和一个真实环境中评估各组件对"取放物"任务整体性能的影响。