Population synthesis consists of generating synthetic but realistic representations of a target population of micro-agents for the purpose of behavioral modeling and simulation. We introduce a new framework based on copulas to generate synthetic data for a target population of which only the empirical marginal distributions are known by using a sample from another population sharing similar marginal dependencies. This makes it possible to include a spatial component in the generation of population synthesis and to combine various sources of information to obtain more realistic population generators. Specifically, we normalize the data and treat them as realizations of a given copula, and train a generative model on the normalized data before injecting the information on the marginals. We compare the copulas framework to IPF and to modern probabilistic approaches such as Bayesian networks, variational auto-encoders, and generative adversarial networks. We also illustrate on American Community Survey data that the method proposed allows to study the structure of the data at different geographical levels in a way that is robust to the peculiarities of the marginal distributions.
翻译:群体合成旨在生成目标微观智能体群体的合成但真实表征,用于行为建模与仿真。我们提出一种基于Copula的新框架,通过利用另一个具有相似边际依赖关系的样本群体,仅依据已知的边际经验分布即可为目标群体生成合成数据。该框架能够在群体合成生成过程中融入空间分量,并通过整合多种信息源获得更真实的群体生成器。具体而言,我们对数据进行归一化处理并将其视为特定Copula的实现,在归一化数据上训练生成模型后再注入边际分布信息。我们将Copula框架与IPF方法以及贝叶斯网络、变分自编码器和生成对抗网络等现代概率方法进行了比较。基于美国社区调查数据的实验表明,所提方法能够以对边际分布特性具有鲁棒性的方式,在不同地理层级上研究数据结构。