Synthetic data generation plays an important role in enabling data sharing, particularly in sensitive domains like healthcare and finance. Recent advances in diffusion models have made it possible to generate realistic, high-quality tabular data, but they may also memorize training records and leak sensitive information. Membership inference attacks (MIAs) exploit this vulnerability by determining whether a record was used in training. While MIAs have been studied in images and text, their use against tabular diffusion models remains underexplored despite the unique risks of structured attributes and limited record diversity. In this paper, we introduce MIAEPT, Membership Inference Attack via Error Prediction for Tabular Data, a novel black-box attack specifically designed to target tabular diffusion models. MIA-EPT constructs errorbased feature vectors by masking and reconstructing attributes of target records, disclosing membership signals based on how well these attributes are predicted. MIA-EPT operates without access to the internal components of the generative model, relying only on its synthetic data output, and was shown to generalize across multiple state-of-the-art diffusion models. We validate MIA-EPT on three diffusion-based synthesizers, achieving AUC-ROC scores of up to 0.599 and TPR@10% FPR values of 22.0% in our internal tests. Under the MIDST 2025 competition conditions, MIA-EPT achieved second place in the Black-box Multi-Table track (TPR@10% FPR = 20.0%). These results demonstrate that our method can uncover substantial membership leakage in synthetic tabular data, challenging the assumption that synthetic data is inherently privacy-preserving. Our code is publicly available at https://github.com/eyalgerman/MIA-EPT.
翻译:合成数据生成在促进数据共享方面发挥着重要作用,尤其是在医疗和金融等敏感领域。扩散模型的最新进展使得生成逼真、高质量的表格数据成为可能,但这些模型也可能记忆训练记录并泄露敏感信息。成员推理攻击(MIAs)利用这一漏洞,通过判断某条记录是否用于训练来实施攻击。尽管MIAs已在图像和文本领域得到研究,但针对表格扩散模型的此类攻击仍未被充分探索,尽管结构化属性和有限的记录多样性带来了独特的风险。本文提出MIA-EPT(面向表格数据的基于误差预测的成员推理攻击),这是一种专门针对表格扩散模型的新型黑盒攻击方法。MIA-EPT通过掩码并重构目标记录的属性来构建基于误差的特征向量,根据这些属性的预测效果揭示成员信息。MIA-EPT无需访问生成模型的内部组件,仅依赖其合成数据输出即可工作,并被证明能泛化至多种最先进的扩散模型。我们在三种基于扩散的合成器上验证了MIA-EPT,在内部测试中取得了高达0.599的AUC-ROC分数和22.0%的TPR@10% FPR值。在MIDST 2025竞赛条件下,MIA-EPT在黑盒多表赛道中获得第二名(TPR@10% FPR = 20.0%)。这些结果表明,我们的方法能够揭示合成表格数据中显著的成员信息泄露,挑战了合成数据天生具有隐私保护性的假设。我们的代码公开于 https://github.com/eyalgerman/MIA-EPT。