To enable progress towards egocentric agents capable of understanding everyday tasks specified in natural language, we propose a benchmark and a synthetic dataset called Egocentric Task Verification (EgoTV). EgoTV contains multi-step tasks with multiple sub-task decompositions, state changes, object interactions, and sub-task ordering constraints, in addition to abstracted task descriptions that contain only partial details about ways to accomplish a task. We also propose a novel Neuro-Symbolic Grounding (NSG) approach to enable the causal, temporal, and compositional reasoning of such tasks. We demonstrate NSG's capability towards task tracking and verification on our EgoTV dataset and a real-world dataset derived from CrossTask (CTV). Our contributions include the release of the EgoTV and CTV datasets, and the NSG model for future research on egocentric assistive agents.
翻译:为促进能够理解自然语言指定日常任务的自我中心智能体的发展,我们提出了一个基准测试和一个名为自我中心任务验证(EgoTV)的合成数据集。EgoTV包含多步骤任务、多个子任务分解、状态变化、对象交互和子任务排序约束,以及仅包含完成任务部分细节的抽象任务描述。我们还提出了一种新颖的神经符号接地(NSG)方法,以实现对此类任务的因果、时序和组合推理。我们在EgoTV数据集和来自CrossTask(CTV)的真实世界数据集上展示了NSG在任务跟踪和验证方面的能力。我们的贡献包括发布EgoTV和CTV数据集,以及用于未来自我中心辅助智能体研究的NSG模型。