Embodied reasoning systems integrate robotic hardware and cognitive processes to perform complex tasks, typically in response to a natural language query about a specific physical environment. This usually involves changing the belief about the scene or physically interacting and changing the scene (e.g. sort the objects from lightest to heaviest). In order to facilitate the development of such systems we introduce a new modular Closed Loop Interactive Embodied Reasoning (CLIER) approach that takes into account the measurements of non-visual object properties, changes in the scene caused by external disturbances as well as uncertain outcomes of robotic actions. CLIER performs multi-modal reasoning and action planning and generates a sequence of primitive actions that can be executed by a robot manipulator. Our method operates in a closed loop, responding to changes in the environment. Our approach is developed with the use of MuBle simulation environment and tested in 10 interactive benchmark scenarios. We extensively evaluate our reasoning approach in simulation and in real-world manipulation tasks with a success rate above 76% and 64%, respectively.
翻译:具身推理系统整合机器人硬件与认知过程以执行复杂任务,通常响应针对特定物理环境的自然语言查询。这通常涉及改变对场景的信念或通过物理交互改变场景(例如将物体按从轻到重排序)。为促进此类系统的发展,我们提出一种新型模块化闭环交互式具身推理(CLIER)方法,该方法综合考虑非视觉物体属性的测量、外部干扰引起的场景变化以及机器人动作的不确定结果。CLIER执行多模态推理与动作规划,生成可由机器人操作器执行的基本动作序列。我们的方法在闭环中运行,能够响应环境变化。本方法基于MuBle仿真环境开发,并在10个交互式基准场景中进行测试。我们在仿真环境与真实世界操作任务中对该推理方法进行了全面评估,成功率分别超过76%与64%。