While ubiquitous wearable sensors capture a wealth of behavioral and physiological information, effectively transforming these signals into personalized health insights is challenging. Specifically, converting low-level sensor data into representations capable of characterizing higher-level states is difficult due to high phenotypic diversity and variation in individual baseline health, physiology, and lifestyle factors. Moreover, collecting wearable data paired with health outcome annotations is laborious and expensive, and retrospective annotation remains practically unfeasible, contributing to a scarcity of data with high-quality labels. To overcome these limitations, we propose a foundation model for wearable health that is pretrained on more than one trillion minutes of unlabeled sensor signals drawn from a large cohort of five million participants. We demonstrate that the joint scaling of model capacity and pretraining data volume leads to systematic improvements in performance, as evaluated on a diverse set of 35 health prediction tasks, spanning cardiovascular, metabolic, sleep, and mental health, as well as lifestyle choices and demographic factors. We find that this population scale representation unlocks label-efficient few-shot learning and generative capabilities for robust daily metric estimation. To further leverage this learned representation, we deploy a classroom of LLM agents to autonomously search the space of downstream predictive heads built on the model embeddings, showing broad performance improvements that increase with LLM model capacity. Finally, we show how integrating these downstream predictors into a Personal Health Agent can support model responses that are more relevant, contextually aware, and safe, and we validate this via 1,860 ratings from a cohort of clinicians.
翻译:尽管无处不在的可穿戴传感器捕获了丰富的行为和生理信息,但如何有效将这些信号转化为个性化的健康见解仍面临挑战。具体而言,将底层传感器数据转化为能表征高层状态的表示形式十分困难,这是由于表型的高度多样性以及个体在基线健康、生理状态和生活方式因素上的差异。此外,收集与健康结局标注相匹配的可穿戴数据既费时又昂贵,回顾性标注在实践上几乎不可行,导致高质量标签数据稀缺。为克服这些限制,我们提出了一种面向可穿戴健康的基础模型,该模型在来自五百万参与者大规模队列的超过一万亿分钟无标签传感器信号上进行了预训练。我们证明,模型容量与预训练数据量的联合扩展能系统性地提升性能——这一结论基于对涵盖心血管、代谢、睡眠、心理健康以及生活方式选择和人口统计学因素的35项健康预测任务的评估。我们发现,这种人群规模的表示能力实现了标签高效的少样本学习和稳健的日常指标估计生成功能。为进一步利用这一学习到的表示,我们部署了一组LLM智能体,以自主搜索基于模型嵌入构建的下游预测头空间,显示出随LLM模型容量提升而增强的广泛性能改进。最后,我们展示了如何将这些下游预测器整合至个人健康智能体中,以支持更具相关性、上下文感知且安全的模型响应,并通过来自临床医生队列的1,860项评分验证了这一点。