Poverty maps are essential tools for governments and NGOs to track socioeconomic changes and adequately allocate infrastructure and services in places in need. Sensor and online crowd-sourced data combined with machine learning methods have provided a recent breakthrough in poverty map inference. However, these methods do not capture local wealth fluctuations, and are not optimized to produce accountable results that guarantee accurate predictions to all sub-populations. Here, we propose a pipeline of machine learning models to infer the mean and standard deviation of wealth across multiple geographically clustered populated places, and illustrate their performance in Sierra Leone and Uganda. These models leverage seven independent and freely available feature sources based on satellite images, and metadata collected via online crowd-sourcing and social media. Our models show that combined metadata features are the best predictors of wealth in rural areas, outperforming image-based models, which are the best for predicting the highest wealth quintiles. Our results recover the local mean and variation of wealth, and correctly capture the positive yet non-monotonous correlation between them. We further demonstrate the capabilities and limitations of model transfer across countries and the effects of data recency and other biases. Our methodology provides open tools to build towards more transparent and interpretable models to help governments and NGOs to make informed decisions based on data availability, urbanization level, and poverty thresholds.
翻译:贫困地图是政府和非政府组织追踪社会经济变化、在需求地区合理部署基础设施和服务的重要工具。传感器和在线众包数据与机器学习方法的结合,为贫困地图推断带来了突破性进展。然而,这些方法未能捕捉当地财富波动,且未能优化以产生对所有子群体均能保证准确预测的可信结果。本文提出了一套机器学习模型流程,用于推断多个地理聚类人口聚居区的财富均值和标准差,并在塞拉利昂和乌干达展示了其性能。这些模型利用了基于卫星图像的七种独立且免费的特征来源,以及通过在线众包和社交媒体收集的元数据。我们的模型表明,在乡村地区,组合元数据特征是财富的最佳预测因子,其表现优于图像模型——后者在预测最高财富五分位数时表现最佳。研究结果恢复了当地财富的均值和变异性,并正确捕捉了二者之间正向但非单调的相关关系。我们进一步展示了模型跨国家迁移的能力与局限性,以及数据时效性与其他偏差的影响。本方法提供了开放工具,用于构建更透明、可解释的模型,以帮助政府和非政府组织根据数据可用性、城市化水平和贫困阈值做出知情决策。