MIA-EPT：面向表格数据的基于误差预测的成员推理攻击 (MIA-EPT: Membership Inference Attack via Error Prediction for Tabular Data)

Synthetic data generation plays an important role in enabling data sharing, particularly in sensitive domains like healthcare and finance. Recent advances in diffusion models have made it possible to generate realistic, high-quality tabular data, but they may also memorize training records and leak sensitive information. Membership inference attacks (MIAs) exploit this vulnerability by determining whether a record was used in training. While MIAs have been studied in images and text, their use against tabular diffusion models remains underexplored despite the unique risks of structured attributes and limited record diversity. In this paper, we introduce MIAEPT, Membership Inference Attack via Error Prediction for Tabular Data, a novel black-box attack specifically designed to target tabular diffusion models. MIA-EPT constructs errorbased feature vectors by masking and reconstructing attributes of target records, disclosing membership signals based on how well these attributes are predicted. MIA-EPT operates without access to the internal components of the generative model, relying only on its synthetic data output, and was shown to generalize across multiple state-of-the-art diffusion models. We validate MIA-EPT on three diffusion-based synthesizers, achieving AUC-ROC scores of up to 0.599 and TPR@10% FPR values of 22.0% in our internal tests. Under the MIDST 2025 competition conditions, MIA-EPT achieved second place in the Black-box Multi-Table track (TPR@10% FPR = 20.0%). These results demonstrate that our method can uncover substantial membership leakage in synthetic tabular data, challenging the assumption that synthetic data is inherently privacy-preserving. Our code is publicly available at https://github.com/eyalgerman/MIA-EPT.

翻译：合成数据生成在促进数据共享方面发挥着重要作用，尤其是在医疗和金融等敏感领域。扩散模型的最新进展使得生成逼真、高质量的表格数据成为可能，但这些模型也可能记忆训练记录并泄露敏感信息。成员推理攻击（MIAs）利用这一漏洞，通过判断某条记录是否用于训练来实施攻击。尽管MIAs已在图像和文本领域得到研究，但针对表格扩散模型的此类攻击仍未被充分探索，尽管结构化属性和有限的记录多样性带来了独特的风险。本文提出MIA-EPT（面向表格数据的基于误差预测的成员推理攻击），这是一种专门针对表格扩散模型的新型黑盒攻击方法。MIA-EPT通过掩码并重构目标记录的属性来构建基于误差的特征向量，根据这些属性的预测效果揭示成员信息。MIA-EPT无需访问生成模型的内部组件，仅依赖其合成数据输出即可工作，并被证明能泛化至多种最先进的扩散模型。我们在三种基于扩散的合成器上验证了MIA-EPT，在内部测试中取得了高达0.599的AUC-ROC分数和22.0%的TPR@10% FPR值。在MIDST 2025竞赛条件下，MIA-EPT在黑盒多表赛道中获得第二名（TPR@10% FPR = 20.0%）。这些结果表明，我们的方法能够揭示合成表格数据中显著的成员信息泄露，挑战了合成数据天生具有隐私保护性的假设。我们的代码公开于 https://github.com/eyalgerman/MIA-EPT。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日