We introduce EgoLife, a project to develop an egocentric life assistant that accompanies and enhances personal efficiency through AI-powered wearable glasses. To lay the foundation for this assistant, we conducted a comprehensive data collection study where six participants lived together for one week, continuously recording their daily activities - including discussions, shopping, cooking, socializing, and entertainment - using AI glasses for multimodal egocentric video capture, along with synchronized third-person-view video references. This effort resulted in the EgoLife Dataset, a comprehensive 300-hour egocentric, interpersonal, multiview, and multimodal daily life dataset with intensive annotation. Leveraging this dataset, we introduce EgoLifeQA, a suite of long-context, life-oriented question-answering tasks designed to provide meaningful assistance in daily life by addressing practical questions such as recalling past relevant events, monitoring health habits, and offering personalized recommendations. To address the key technical challenges of (1) developing robust visual-audio models for egocentric data, (2) enabling identity recognition, and (3) facilitating long-context question answering over extensive temporal information, we introduce EgoButler, an integrated system comprising EgoGPT and EgoRAG. EgoGPT is an omni-modal model trained on egocentric datasets, achieving state-of-the-art performance on egocentric video understanding. EgoRAG is a retrieval-based component that supports answering ultra-long-context questions. Our experimental studies verify their working mechanisms and reveal critical factors and bottlenecks, guiding future improvements. By releasing our datasets, models, and benchmarks, we aim to stimulate further research in egocentric AI assistants.
翻译:我们推出EgoLife项目,旨在开发一种以自我为中心的生活助手,通过AI驱动的可穿戴眼镜伴随用户并提升个人效率。为构建该助手的基础,我们开展了一项综合性数据采集研究:六名参与者共同生活一周,使用AI眼镜持续记录其日常活动(包括讨论、购物、烹饪、社交和娱乐),实现多模态自我中心视角视频采集,并同步获取第三人称视角视频参照。此项工作构建了EgoLife数据集——一个包含300小时、经过密集标注的综合性自我中心视角、人际交互、多视角与多模态日常生活数据集。基于此数据集,我们提出EgoLifeQA,这是一套面向长期上下文、以生活为导向的问答任务集,旨在通过处理实际生活问题(如回忆过往相关事件、监测健康习惯、提供个性化建议)来提供有意义的日常辅助。针对三大关键技术挑战:(1)开发适用于自我中心数据的鲁棒视听模型,(2)实现身份识别,以及(3)支持基于长时间跨度的长上下文问答,我们提出了集成系统EgoButler,其包含EgoGPT与EgoRAG两个核心组件。EgoGPT是在自我中心数据集上训练的全模态模型,在自我中心视频理解任务中达到最先进性能;EgoRAG是基于检索的组件,支持超长上下文问题的回答。实验研究验证了系统的工作机制,揭示了关键因素与性能瓶颈,为未来改进指明方向。通过开源数据集、模型与基准测试,我们期望推动自我中心AI助手领域的进一步研究。