In an unsupervised attack on variational autoencoders (VAEs), an adversary finds a small perturbation in an input sample that significantly changes its latent space encoding, thereby compromising the reconstruction for a fixed decoder. A known reason for such vulnerability is the distortions in the latent space resulting from a mismatch between approximated latent posterior and a prior distribution. Consequently, a slight change in an input sample can move its encoding to a low/zero density region in the latent space resulting in an unconstrained generation. This paper demonstrates that an optimal way for an adversary to attack VAEs is to exploit a directional bias of a stochastic pullback metric tensor induced by the encoder and decoder networks. The pullback metric tensor of an encoder measures the change in infinitesimal latent volume from an input to a latent space. Thus, it can be viewed as a lens to analyse the effect of input perturbations leading to latent space distortions. We propose robustness evaluation scores using the eigenspectrum of a pullback metric tensor. Moreover, we empirically show that the scores correlate with the robustness parameter $\beta$ of the $\beta-$VAE. Since increasing $\beta$ also degrades reconstruction quality, we demonstrate a simple alternative using \textit{mixup} training to fill the empty regions in the latent space, thus improving robustness with improved reconstruction.
翻译:在针对变分自编码器的无监督攻击中,攻击者通过施加微小输入扰动致使潜在空间编码显著偏移,从而破坏固定解码器的重构性能。此类脆弱性的已知成因在于潜在空间畸变——近似后验与先验分布之间的失配导致潜在空间产生扭曲。因此,输入样本的微小变化可能使其编码落入潜在空间的低密度/零密度区域,进而引发不受约束的解码生成。本文证明,攻击者对VAEs实施攻击的最优策略是利用编码器和解码器网络诱导的随机拉回度量张量的方向性偏差。编码器的拉回度量张量可度量从输入空间到潜在空间的无穷小微元体积变化,因此可作为分析导致潜在空间畸变的输入扰动效应的透镜。我们提出基于拉回度量张量特征谱的鲁棒性评估指标,并实验证明这些指标与β-VAE的鲁棒性参数β存在相关性。鉴于增大β会降低重构质量,我们进一步展示采用mixup训练的简易替代方案——通过填充潜在空间的空白区域,在提升鲁棒性的同时改善重构质量。