This study investigates privacy leakage in dimensionality reduction methods through a novel machine learning-based reconstruction attack. Employing an informed adversary threat model, we develop a neural network capable of reconstructing high-dimensional data from low-dimensional embeddings. We evaluate six popular dimensionality reduction techniques: PCA, sparse random projection (SRP), multidimensional scaling (MDS), Isomap, t-SNE, and UMAP. Using both MNIST and NIH Chest X-ray datasets, we perform a qualitative analysis to identify key factors affecting reconstruction quality. Furthermore, we assess the effectiveness of an additive noise mechanism in mitigating these reconstruction attacks. Our experimental results on both datasets reveal that the attack is effective against deterministic methods (PCA and Isomap), but ineffective against methods that employ random initialization (SRP, MDS, t-SNE and UMAP). When adding the images with large noises before performing PCA or Isomap, the attack produced severely distorted reconstructions. In contrast, for the other four methods, the reconstructions still show some recognizable features, though they bear little resemblance to the original images.
翻译:本研究通过一种新颖的基于机器学习的重构攻击,探究降维方法中的隐私泄露问题。采用知情对手威胁模型,我们开发了一种能够从低维嵌入重构高维数据的神经网络。我们评估了六种常用的降维技术:主成分分析(PCA)、稀疏随机投影(SRP)、多维缩放(MDS)、等距映射(Isomap)、t分布随机邻域嵌入(t-SNE)以及均匀流形逼近与投影(UMAP)。利用MNIST和NIH胸部X射线数据集,我们进行了定性分析以识别影响重构质量的关键因素。此外,我们评估了加性噪声机制在缓解此类重构攻击方面的有效性。在两个数据集上的实验结果表明,该攻击对确定性方法(PCA和Isomap)有效,但对采用随机初始化的方法(SRP、MDS、t-SNE和UMAP)无效。在执行PCA或Isomap前向图像添加较大噪声时,攻击产生的重构结果严重失真。相比之下,对于其他四种方法,重构结果虽与原始图像相似度极低,但仍显示出某些可识别的特征。