In this paper, we propose a novel task, Manipulation Question Answering (MQA), where the robot performs manipulation actions to change the environment in order to answer a given question. To solve this problem, a framework consisting of a QA module and a manipulation module is proposed. For the QA module, we adopt the method for the Visual Question Answering (VQA) task. For the manipulation module, a Deep Q Network (DQN) model is designed to generate manipulation actions for the robot to interact with the environment. We consider the situation where the robot continuously manipulating objects inside a bin until the answer to the question is found. Besides, a novel dataset that contains a variety of object models, scenarios and corresponding question-answer pairs is established in a simulation environment. Extensive experiments have been conducted to validate the effectiveness of the proposed framework.
翻译:本文提出一项新任务——操作问答(Manipulation Question Answering, MQA),其中机器人通过执行操作动作改变环境以回答给定问题。为解决该问题,我们提出一个由问答模块与操作模块组成的框架。在问答模块中,我们采用视觉问答(VQA)任务的方法;在操作模块中,设计了一个深度Q网络(DQN)模型,用于生成机器人与环境交互的操作动作。我们考虑了机器人在容器内持续操作物体直至找到问题答案的场景。此外,在仿真环境中构建了一个包含多种物体模型、场景及对应问答对的新数据集。大量实验验证了所提框架的有效性。