Accurate and timely population data are essential for disaster response and humanitarian planning, but traditional censuses often cannot capture rapid demographic changes. Social media data offer a promising alternative for dynamic population monitoring, but their representativeness remains poorly understood and stringent privacy requirements limit their reliability. Here, we address these limitations in the context of the Philippines by calibrating Facebook user counts with the country's 2020 census figures. First, we find that differential privacy techniques commonly applied to social media-based population datasets disproportionately mask low-population areas. To address this, we propose a Bayesian imputation approach to recover missing values, restoring data coverage for $5.5\%$ of rural areas. Further, using the imputed social media data and leveraging predictors such as urbanisation level, demographic composition, and socio-economic status, we develop a statistical model for the proportion of Facebook users in each municipality, which links observed Facebook user numbers to the true population levels. Out-of-sample validation demonstrates strong result generalisability, with errors as low as ${\approx}18\%$ and ${\approx}24\%$ for urban and rural Facebook user proportions, respectively. We further demonstrate that accounting for overdispersion and spatial correlations in the data is crucial to obtain accurate estimates and appropriate credible intervals. Crucially, as predictors change over time, the models can be used to regularly update the population predictions, providing a dynamic complement to census-based estimates. These results have direct implications for humanitarian response in disaster-prone regions and offer a general framework for using biased social media signals to generate reliable and timely population data.
翻译:准确且及时的人口数据对于灾害响应和人道主义规划至关重要,但传统的人口普查往往无法捕捉快速的人口变化。社交媒体数据为动态人口监测提供了一种有前景的替代方案,但其代表性仍知之甚少,且严格的隐私要求限制了其可靠性。本文以菲律宾为例,通过将Facebook用户数量与该国2020年人口普查数据进行校准,以应对这些局限性。首先,我们发现通常应用于基于社交媒体的人口数据集的差分隐私技术会不成比例地掩盖低人口区域。为解决此问题,我们提出了一种贝叶斯插补方法来恢复缺失值,为$5.5\%$的农村地区恢复了数据覆盖。此外,利用插补后的社交媒体数据以及城市化水平、人口构成和社会经济地位等预测因子,我们为每个城市的Facebook用户比例开发了一个统计模型,该模型将观测到的Facebook用户数量与真实人口水平联系起来。样本外验证证明了结果具有很强的泛化能力,城市和农村Facebook用户比例的错误率分别低至${\approx}18\%$和${\approx}24\%$。我们进一步证明,考虑数据中的过度离散和空间相关性对于获得准确估计和适当的可信区间至关重要。重要的是,随着预测因子随时间变化,该模型可用于定期更新人口预测,为基于普查的估计提供动态补充。这些结果对灾害频发地区的人道主义响应具有直接意义,并提供了一个通用框架,用于利用有偏的社交媒体信号生成可靠且及时的人口数据。