Fine-grained migration data illuminate important demographic, environmental, and health phenomena. However, migration datasets within the United States remain lacking: publicly available Census data are neither spatially nor temporally granular, and proprietary data have higher resolution but demographic and other biases. To address these limitations, we develop a scalable iterative-proportional-fitting based method which reconciles high-resolution but biased proprietary data with low-resolution but more reliable Census data. We apply this method to produce MIGRATE, a dataset of annual migration matrices from 2010 - 2019 which captures flows between 47.4 billion pairs of Census Block Groups -- about four thousand times more granular than publicly available data. These estimates are highly correlated with external ground-truth datasets, and improve accuracy and reduce bias relative to raw proprietary data. We publicly release MIGRATE estimates and provide a case study illustrating how they reveal granular patterns of migration in response to California wildfires.
翻译:精细尺度迁移数据能够揭示重要的人口、环境与健康现象。然而,美国境内的迁移数据集仍存在不足:公开的人口普查数据在空间和时间维度均缺乏精细粒度,而专有数据虽具有更高分辨率却存在人口统计及其他偏差。为应对这些局限,我们开发了一种基于可扩展迭代比例拟合的方法,该方法将高分辨率但存在偏差的专有数据与低分辨率但更可靠的人口普查数据进行协调融合。我们应用此方法构建了MIGRATE数据集,该数据集包含2010年至2019年度的年度迁移矩阵,捕捉了474亿对人口普查区块组之间的流动关系——其精细程度约为公开数据的四千倍。这些估计值与外部真实数据集高度相关,相较于原始专有数据,在提升准确性的同时减少了偏差。我们公开发布了MIGRATE数据集,并通过案例研究展示了其如何揭示加州野火引发的精细尺度迁移模式。