Synthetic data generation is increasingly recognized as a crucial solution to address data related challenges such as scarcity, bias, and privacy concerns. As synthetic data proliferates, the need for a robust evaluation framework to select a synthetic data generator becomes more pressing given the variety of options available. In this research study, we investigate two primary questions: 1) How can we select the most suitable synthetic data generator from a set of options for a specific purpose? 2) How can we make the selection process more transparent, accountable, and auditable? To address these questions, we introduce a novel approach in which the proposed ranking algorithm is implemented as a smart contract within a permissioned blockchain framework called Sawtooth. Through comprehensive experiments and comparisons with state-of-the-art baseline ranking solutions, our framework demonstrates its effectiveness in providing nuanced rankings that consider both desirable and undesirable properties. Furthermore, our framework serves as a valuable tool for selecting the optimal synthetic data generators for specific needs while ensuring compliance with data protection principles.
翻译:合成数据生成日益被认为是解决数据稀缺、偏差和隐私问题等数据相关挑战的关键方案。随着合成数据的激增,面对众多可选的生成器,建立一个稳健的评估框架以选择合成数据生成器变得愈发紧迫。在本研究中,我们探讨两个主要问题:1)如何从一组生成器中为特定目的选择最合适的合成数据生成器?2)如何使选择过程更透明、可问责且可审计?为应对这些问题,我们提出一种新颖方法,其中所提出的排名算法作为智能合约在名为Sawtooth的许可区块链框架中实现。通过全面的实验以及与先进基线排名解决方案的比较,我们的框架展示了其在提供兼顾期望与非期望属性的精细排名方面的有效性。此外,该框架成为根据特定需求选择最优合成数据生成器并确保符合数据保护原则的重要工具。