There is an overgrowing demand for data sharing in academia and industry. However, such sharing has issues with personal privacy and data confidentiality. One option is to share only synthetically-generated versions of the real data. Generative Adversarial Network (GAN) is a recently-popular technique that can be used for this purpose. Relational databases usually have multiple tables that are related to each other. So far, the use of GANs has essentially focused on generating single tables. This paper presents Incremental Relational Generator (IRG), which uses GANs to synthetically generate interrelated tables. Given an empirical relational database, IRG can generate a synthetic version that can be safely shared. IRG generates the tables in some sequential order. The key idea is to construct a context, based on the tables generated so far, when using a GAN to generate the next table. Experiments with public datasets and private student data show that IRG outperforms state-of-the-art in terms of statistical properties and query results.
翻译:学术界与工业界对数据共享的需求日益增长。然而,此类共享存在个人隐私与数据保密性问题。一种解决方案是仅共享真实数据的合成版本。生成对抗网络(GAN)是近期流行的可用于此目的的技术。关系数据库通常包含多个相互关联的数据表。目前,GAN的应用主要聚焦于单表生成。本文提出增量式关系生成器(IRG),该技术利用GAN合成生成相互关联的数据表。给定经验关系数据库,IRG可生成可安全共享的合成版本。IRG按特定顺序逐表生成数据,其核心思想是在使用GAN生成下一张数据表时,基于已生成的数据表构建上下文信息。在公开数据集与私有学生数据上的实验表明,IRG在统计特性与查询结果方面均优于现有最优方法。