Synthetic data generation is a key technique in modern artificial intelligence, addressing data scarcity, privacy constraints, and the need for diverse datasets in training robust models. In this work, we propose a method for generating privacy-preserving high-quality synthetic tabular data using Tensor Networks, specifically Matrix Product States (MPS). We benchmark the MPS-based generative model against state-of-the-art models such as CTGAN, VAE, and PrivBayes, focusing on both fidelity and privacy-preserving capabilities. To ensure differential privacy (DP), we integrate noise injection and gradient clipping during training, enabling privacy guarantees via Rényi Differential Privacy accounting. Across multiple metrics analyzing data fidelity and downstream machine learning task performance, our results show that MPS outperforms classical models, particularly under strict privacy constraints. This work highlights MPS as a promising tool for privacy-aware synthetic data generation. By combining the expressive power of tensor network representations with formal privacy mechanisms, the proposed approach offers an interpretable and scalable alternative for secure data sharing. Its structured design facilitates integration into sensitive domains where both data quality and confidentiality are critical.
翻译:合成数据生成是现代人工智能中的关键技术,旨在解决数据稀缺性、隐私限制以及训练鲁棒模型所需多样化数据集的需求。本研究提出一种利用张量网络——特别是矩阵乘积态(MPS)——生成具有隐私保护能力的高质量合成表格数据的方法。我们将基于MPS的生成模型与当前先进模型(如CTGAN、VAE和PrivBayes)进行基准测试,重点关注数据保真度与隐私保护能力。为确保差分隐私(DP),我们在训练过程中集成噪声注入与梯度裁剪技术,通过Rényi差分隐私核算机制实现隐私保障。通过多维度指标分析数据保真度及下游机器学习任务性能,结果表明MPS模型优于传统模型,尤其在严格隐私约束条件下表现突出。本工作彰显了MPS作为隐私感知合成数据生成工具的潜力。通过将张量网络表示的表达能力与形式化隐私机制相结合,所提方法为安全数据共享提供了一种可解释且可扩展的替代方案。其结构化设计便于集成至数据质量与机密性均至关重要的敏感领域。