PARSE-Ego4D: Personal Action Recommendation Suggestions for Egocentric Videos

Intelligent assistance involves not only understanding but also action. Existing ego-centric video datasets contain rich annotations of the videos, but not of actions that an intelligent assistant could perform in the moment. To address this gap, we release PARSE-Ego4D, a new set of personal action recommendation annotations for the Ego4D dataset. We take a multi-stage approach to generating and evaluating these annotations. First, we used a prompt-engineered large language model (LLM) to generate context-aware action suggestions and identified over 18,000 action suggestions. While these synthetic action suggestions are valuable, the inherent limitations of LLMs necessitate human evaluation. To ensure high-quality and user-centered recommendations, we conducted a large-scale human annotation study that provides grounding in human preferences for all of PARSE-Ego4D. We analyze the inter-rater agreement and evaluate subjective preferences of participants. Based on our synthetic dataset and complete human annotations, we propose several new tasks for action suggestions based on ego-centric videos. We encourage novel solutions that improve latency and energy requirements. The annotations in PARSE-Ego4D will support researchers and developers who are working on building action recommendation systems for augmented and virtual reality systems.

翻译：智能辅助不仅涉及理解，更关乎行动。现有的第一人称视频数据集虽包含丰富的视频标注，但缺乏对智能助手可即时执行行动的标注。为弥补这一空白，我们发布了PARSE-Ego4D——为Ego4D数据集构建的全新个人行动推荐标注集。我们采用多阶段方法生成并评估这些标注：首先通过提示工程优化的大语言模型生成情境感知的行动建议，识别出超过18,000项行动建议。尽管这些合成行动建议具有价值，但大语言模型的固有局限仍需人工评估。为确保高质量且以用户为中心的推荐，我们开展了大规模人工标注研究，为PARSE-Ego4D所有数据建立了基于人类偏好的基准。我们分析了标注者间一致性，并评估了参与者的主观偏好。基于合成数据集与完整人工标注，我们提出了若干基于第一人称视频的行动建议新任务，鼓励探索降低延迟与能耗的创新解决方案。PARSE-Ego4D的标注将助力研究人员与开发者构建适用于增强现实与虚拟现实系统的行动推荐系统。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日