Supporting the health and well-being of dynamic populations around the world requires governmental agencies, organizations and researchers to understand and reason over complex relationships between human behavior and local contexts in order to identify high-risk groups and strategically allocate limited resources. Traditional approaches to these classes of problems often entail developing manually curated, task-specific features and models to represent human behavior and the natural and built environment, which can be challenging to adapt to new, or even, related tasks. To address this, we introduce a Population Dynamics Foundation Model (PDFM) that aims to capture the relationships between diverse data modalities and is applicable to a broad range of geospatial tasks. We first construct a geo-indexed dataset for postal codes and counties across the United States, capturing rich aggregated information on human behavior from maps, busyness, and aggregated search trends, and environmental factors such as weather and air quality. We then model this data and the complex relationships between locations using a graph neural network, producing embeddings that can be adapted to a wide range of downstream tasks using relatively simple models. We evaluate the effectiveness of our approach by benchmarking it on 27 downstream tasks spanning three distinct domains: health indicators, socioeconomic factors, and environmental measurements. The approach achieves state-of-the-art performance on all 27 geospatial interpolation tasks, and on 25 out of the 27 extrapolation and super-resolution tasks. We combined the PDFM with a state-of-the-art forecasting foundation model, TimesFM, to predict unemployment and poverty, achieving performance that surpasses fully supervised forecasting. The full set of embeddings and sample code are publicly available for researchers.
翻译:支持全球动态人口的健康与福祉,需要政府机构、组织及研究人员理解并推演人类行为与地方环境之间的复杂关系,以识别高风险群体并战略性地配置有限资源。传统方法通常需要针对特定任务手动构建特征和模型来表示人类行为及自然与建成环境,这难以适应新任务甚至相关任务。为此,我们提出一种人口动力学基础模型(PDFM),旨在捕捉多模态数据之间的关系,并适用于广泛的地理空间任务。我们首先构建了一个针对美国邮政编码和县级的带地理索引数据集,整合了来自地图数据、商业活跃度、聚合搜索趋势的人类行为丰富信息,以及天气、空气质量等环境因素。随后,我们利用图神经网络对该数据及位置间的复杂关系进行建模,生成可通过简单模型适配多种下游任务的嵌入表示。通过将方法在27项跨越健康指标、社会经济因素和环境测量三个不同领域的下游任务上基准测试,我们评估了其有效性。该方法在所有27项地理空间插值任务中均达到最优性能,并在27项外推和超分辨率任务中的25项上取得最优结果。我们将PDFM与前沿预测基础模型TimesFM结合用于预测失业率与贫困率,其性能超越了全监督预测。完整的嵌入表示及示例代码现已向研究人员公开。