LifeAgentBench: A Multi-dimensional Benchmark and Agent for Personal Health Assistants in Digital Health

Personalized digital health support requires long-horizon, cross-dimensional reasoning over heterogeneous lifestyle signals, and recent advances in mobile sensing and large language models (LLMs) make such support increasingly feasible. However, the capabilities of current LLMs in this setting remain unclear due to the lack of systematic benchmarks. In this paper, we introduce LifeAgentBench, a large-scale QA benchmark for long-horizon, cross-dimensional, and multi-user lifestyle health reasoning, containing 22,573 questions spanning from basic retrieval to complex reasoning. We release an extensible benchmark construction pipeline and a standardized evaluation protocol to enable reliable and scalable assessment of LLM-based health assistants. We then systematically evaluate 11 leading LLMs on LifeAgentBench and identify key bottlenecks in long-horizon aggregation and cross-dimensional reasoning. Motivated by these findings, we propose LifeAgent as a strong baseline agent for health assistant that integrates multi-step evidence retrieval with deterministic aggregation, achieving significant improvements compared with two widely used baselines. Case studies further demonstrate its potential in realistic daily-life scenarios. The benchmark is publicly available at https://anonymous.4open.science/r/LifeAgentBench-CE7B.

翻译：个性化数字健康支持需要对异构生活方式信号进行长时程、跨维度的推理，而移动传感和大语言模型（LLM）的最新进展使得此类支持日益可行。然而，由于缺乏系统性基准，当前LLM在此场景下的能力仍不明确。本文提出LifeAgentBench，一个用于长时程、跨维度、多用户生活方式健康推理的大规模问答基准，包含22,573个从基础检索到复杂推理的问题。我们发布了可扩展的基准构建流程和标准化评估协议，以实现对基于LLM的健康助手进行可靠且可扩展的评估。随后，我们系统性地评估了11个主流LLM在LifeAgentBench上的表现，并识别了在长时程信息聚合和跨维度推理中的关键瓶颈。基于这些发现，我们提出了LifeAgent作为一个强基线健康助手智能体，它整合了多步证据检索与确定性聚合机制，相比两种广泛使用的基线方法取得了显著提升。案例研究进一步展示了其在真实日常生活场景中的应用潜力。该基准已公开于https://anonymous.4open.science/r/LifeAgentBench-CE7B。

相关内容

健康

关注 27

健康是指一个人在身体、精神和社会等方面都处于良好的状态。健康包括两个方面的内容：

一是主要脏器无疾病，身体形态发育良好，体形均匀，人体各系统具有良好的生理功能，有较强的身体活动能力和劳动能力，这是对健康最基本的要求；

二是对疾病的抵抗能力较强，能够适应环境变化，各种生理刺激以及致病因素对身体的作用。传统的健康观是“无病即健康”，现代人的健康观是整体健康，世界卫生组织提出“健康不仅是躯体没有疾病，还要具备心理健康、社会适应良好和有道德”。因此，现代人的健康内容包括：躯体健康、心理健康、心灵健康、社会健康、智力健康、道德健康、环境健康等。健康是人的基本权利。健康是人生的第一财富。

迈向个性化大语言模型驱动的智能体：基础、评估与未来方向

专知会员服务

26+阅读 · 2月27日

智能体记忆深度剖析：评价指标与系统局限性的分类体系及实证分析

专知会员服务

20+阅读 · 2月26日

智能体评判者（Agent-as-a-Judge）研究综述

专知会员服务

37+阅读 · 1月9日

基于强化学习的智能体化搜索全面综述：基础、角色、优化、评估与应用

专知会员服务

23+阅读 · 2025年10月22日