SynthEval: A Framework for Detailed Utility and Privacy Evaluation of Tabular Synthetic Data

With the growing demand for synthetic data to address contemporary issues in machine learning, such as data scarcity, data fairness, and data privacy, having robust tools for assessing the utility and potential privacy risks of such data becomes crucial. SynthEval, a novel open-source evaluation framework distinguishes itself from existing tools by treating categorical and numerical attributes with equal care, without assuming any special kind of preprocessing steps. This~makes it applicable to virtually any synthetic dataset of tabular records. Our tool leverages statistical and machine learning techniques to comprehensively evaluate synthetic data fidelity and privacy-preserving integrity. SynthEval integrates a wide selection of metrics that can be used independently or in highly customisable benchmark configurations, and can easily be extended with additional metrics. In this paper, we describe SynthEval and illustrate its versatility with examples. The framework facilitates better benchmarking and more consistent comparisons of model capabilities.

翻译：随着合成数据在应对机器学习领域数据稀缺、数据公平性及数据隐私等当代问题中的需求日益增长，开发用于评估此类数据效用及潜在隐私风险的稳健工具变得至关重要。SynthEval作为一种新型开源评估框架，其独特之处在于平等对待类别属性和数值属性，无需假设任何特殊的预处理步骤，这使得它几乎适用于任何表格记录形式的合成数据集。该工具利用统计和机器学习技术全面评估合成数据的保真度与隐私保护完整性。SynthEval集成了多种可选指标，既可独立使用，也可在高度可定制的基准测试配置中组合应用，并支持便捷地扩展新增指标。本文详细描述了SynthEval框架，并通过实例展示了其多功能性。该框架有助于改进模型能力的基准测试，并促进更一致的对比分析。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日