TeleEgo：在真实场景中评估第一人称人工智能助手的基准 (TeleEgo: Benchmarking Egocentric AI Assistants in the Wild)

Jiaqi Yan,Ruilong Ren,Jingren Liu,Shuning Xu,Ling Wang,Yiheng Wang,Yun Wang,Long Zhang,Xiangyu Chen,Changzhi Sun,Jixiang Luo,Dell Zhang,Hao Sun,Chi Zhang,Xuelong Li

Egocentric AI assistants in real-world settings must process multi-modal inputs (video, audio, text), respond in real time, and retain evolving long-term memory. However, existing benchmarks typically evaluate these abilities in isolation, lack realistic streaming scenarios, or support only short-term tasks. We introduce \textbf{TeleEgo}, a long-duration, streaming, omni-modal benchmark for evaluating egocentric AI assistants in realistic daily contexts. The dataset features over 14 hours per participant of synchronized egocentric video, audio, and text across four domains: work \& study, lifestyle \& routines, social activities, and outings \& culture. All data is aligned on a unified global timeline and includes high-quality visual narrations and speech transcripts, curated through human refinement.TeleEgo defines 12 diagnostic subtasks across three core capabilities: Memory (recalling past events), Understanding (interpreting the current moment), and Cross-Memory Reasoning (linking distant events). It contains 3,291 human-verified QA items spanning multiple question formats (single-choice, binary, multi-choice, and open-ended), evaluated strictly in a streaming setting. We propose two key metrics -- Real-Time Accuracy and Memory Persistence Time -- to jointly assess correctness, temporal responsiveness, and long-term retention. TeleEgo provides a realistic and comprehensive evaluation to advance the development of practical AI assistants.

翻译：真实世界中的第一人称人工智能助手必须处理多模态输入（视频、音频、文本），实时响应，并保持不断演化的长期记忆。然而，现有基准测试通常孤立地评估这些能力，缺乏真实的流式场景，或仅支持短期任务。我们提出\\textbf{TeleEgo}，一个用于在真实日常情境中评估第一人称人工智能助手的长时、流式、全模态基准。该数据集包含每位参与者超过14小时的同步第一人称视频、音频和文本数据，覆盖四个领域：工作与学习、生活方式与日常、社交活动以及外出与文化。所有数据均在统一的全局时间线上对齐，并包含通过人工精修的高质量视觉叙述和语音转录。TeleEgo定义了涵盖三个核心能力的12项诊断子任务：记忆（回忆过去事件）、理解（解释当前时刻）和跨记忆推理（关联远距离事件）。它包含3,291个人工验证的问答项，涵盖多种问题形式（单选、二元、多选和开放式），并在严格的流式设置下进行评估。我们提出两个关键指标——实时准确率和记忆保持时间——以共同评估正确性、时间响应性和长期保留能力。TeleEgo提供了一个真实且全面的评估框架，以推动实用人工智能助手的发展。

相关内容

关注 7093

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日