SuperMemory-VQA: An Egocentric Visual Question-Answering Benchmark for Long-Horizon Memory

AI glasses present a compelling platform for AI agents to serve as personalized memory assistants. To be genuinely useful, such systems must move beyond short-term video comprehension and address memory gaps that humans experience for practical, personal, or social purposes over longitudinal egocentric video streams. However, existing egocentric datasets predominantly focus on action recognition or generic QAs from short clips, measuring perceptual capabilities rather than realistic human memory needs. We introduce SuperMemory-VQA, an egocentric visual question answering (VQA) dataset for evaluating AI assistants on practical, long-horizon memory tasks. It contains 52.9 hours of everyday activities recorded with AI glasses, including synchronized RGB video, audio transcription, eye gaze, IMU, and SLAM trajectories. Through a human-verified annotation pipeline, we construct grounded 4,853 question-answer pairs that span object and location memory, intent recall, visual scene recall, timeline reconstruction, conversational memory, and in-context retrieval. Each question is posed as multiple-choice with an explicit "unanswerable" option to test hallucination robustness. Benchmarking leading agentic frameworks and LLM backbones reveals that existing systems remain far from reliable on real-world memory tasks, highlighting the need for new architectures for grounded AI memory that can answer only when evidence is sufficient. A participant survey further supports that our questions are realistic, useful, and aligned with everyday memory needs.

翻译：AI眼镜为AI代理人作为个性化记忆助手提供了引人注目的平台。要真正发挥效用，此类系统必须超越短期视频理解，解决人类在纵向自我中心视频流中因实际、个人或社交目的而经历的记忆缺口。然而，现有自我中心数据集主要聚焦于短片段的行为识别或通用问答，评测的是感知能力而非真实的人类记忆需求。我们推出SuperMemory-VQA，这是一个用于评估AI助手在实用、长时间跨度记忆任务上表现的自我中心视觉问答数据集。该数据集包含52.9小时使用AI眼镜记录的日常活动，包括同步的RGB视频、音频转录、眼动轨迹、惯性测量单元数据和SLAM轨迹。通过人工验证的标注流程，我们构建了4,853个经过实证的问题-答案对，涵盖物体与位置记忆、意图回忆、视觉场景回忆、时间线重建、对话记忆及上下文检索。每个问题均以选择题形式呈现，并设有明确的“不可回答”选项，以测试幻觉鲁棒性。对领先智能体框架及大语言模型骨干的基准测试表明，现有系统在实际记忆任务上仍远未达到可靠水平，这凸显了开发新型架构以支持仅在证据充分时才能回答的基于实证AI记忆的必要性。参与者调查进一步证实，我们的问题具有现实性、实用性，且符合日常记忆需求。

相关内容

关注 7111

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

AI智能体时代中的记忆：形式、功能与动态综述

专知会员服务

37+阅读 · 2025年12月16日

【NeurIPS2025】VideoLucy：用于长视频理解的深度记忆回溯机制

专知会员服务

10+阅读 · 2025年10月15日

【万字长文】视觉问答VQA：从早期发展到最新进展——综述

专知会员服务

26+阅读 · 2025年1月8日