In an era characterized by advancements in artificial intelligence and robotics, enabling machines to interact with and understand their environment is a critical research endeavor. In this paper, we propose Answerability Fields, a novel approach to predicting answerability within complex indoor environments. Leveraging a 3D question answering dataset, we construct a comprehensive Answerability Fields dataset, encompassing diverse scenes and questions from ScanNet. Using a diffusion model, we successfully infer and evaluate these Answerability Fields, demonstrating the importance of objects and their locations in answering questions within a scene. Our results showcase the efficacy of Answerability Fields in guiding scene-understanding tasks, laying the foundation for their application in enhancing interactions between intelligent agents and their environments.
翻译:在人工智能与机器人技术不断进步的时代,使机器能够与环境交互并理解环境是一项关键的研究任务。本文提出可回答性场,一种用于预测复杂室内环境中问题可回答性的新方法。利用三维问答数据集,我们构建了一个全面的可回答性场数据集,涵盖来自ScanNet的多样化场景与问题。通过扩散模型,我们成功推断并评估了这些可回答性场,证明了物体及其位置在场景内回答问题的重要性。我们的结果展示了可回答性场在指导场景理解任务中的有效性,为其应用于增强智能体与环境间的交互奠定了基础。