It is increasingly important to generate synthetic populations with explicit coordinates rather than coarse geographic areas, yet no established methods exist to achieve this. One reason is that latitude and longitude differ from other continuous variables, exhibiting large empty spaces and highly uneven densities. To address this, we propose a population synthesis algorithm that first maps spatial coordinates into a more regular latent space using Normalizing Flows (NF), and then combines them with other features in a Variational Autoencoder (VAE) to generate synthetic populations. This approach also learns the joint distribution between spatial and non-spatial features, exploiting spatial autocorrelations. We demonstrate the method by generating synthetic homes with the same statistical properties of real homes in 121 datasets, corresponding to diverse geographies. We further propose an evaluation framework that measures both spatial accuracy and practical utility, while ensuring privacy preservation. Our results show that the NF+VAE architecture outperforms popular benchmarks, including copula-based methods and uniform allocation within geographic areas. The ability to generate geolocated synthetic populations at fine spatial resolution opens the door to applications requiring detailed geography, from household responses to floods, to epidemic spread, evacuation planning, and transport modeling.
翻译:生成具有明确坐标而非粗略地理区域的合成人口正变得越来越重要,但目前尚无成熟方法来实现这一目标。其中一个原因是经纬度与其他连续变量不同,它们存在大量空白区域且密度分布极不均匀。为解决此问题,我们提出一种人口合成算法:首先通过标准化流(NF)将空间坐标映射到更规则的潜在空间,然后在变分自编码器(VAE)中将其与其他特征结合以生成合成人口。该方法还能学习空间与非空间特征之间的联合分布,并利用空间自相关性。我们通过在121个对应不同地理区域的数据集上生成具有真实住宅相同统计特性的合成住宅,验证了该方法的有效性。我们进一步提出一个评估框架,在确保隐私保护的同时衡量空间准确性与实际效用。实验结果表明,NF+VAE架构优于包括基于copula的方法及地理区域内均匀分配在内的常用基准方法。生成精细空间分辨率的地理定位合成人口的能力,为需要详细地理信息的应用开启了大门,涵盖从家庭对洪水的响应、疫情传播、疏散规划到交通建模等多个领域。