We study the recovery of training data from overparameterized autoencoder models. Given a degraded training sample, we define the recovery of the original sample as an inverse problem and formulate it as an optimization task. In our inverse problem, we use the trained autoencoder to implicitly define a regularizer for the particular training dataset that we aim to retrieve from. We develop the intricate optimization task into a practical method that iteratively applies the trained autoencoder and relatively simple computations that estimate and address the unknown degradation operator. We evaluate our method for blind inpainting where the goal is to recover training images from degradation of many missing pixels in an unknown pattern. We examine various deep autoencoder architectures, such as fully connected and U-Net (with various nonlinearities and at diverse train loss values), and show that our method significantly outperforms previous methods for training data recovery from autoencoders. Importantly, our method greatly improves the recovery performance also in settings that were previously considered highly challenging, and even impractical, for such retrieval.
翻译:我们研究从超参数化自编码器模型中恢复训练数据的问题。给定一个退化的训练样本,我们将原始样本的恢复定义为逆问题,并将其构建为优化任务。在该逆问题中,我们利用训练好的自编码器隐式地定义针对特定训练数据集的正则化项,该数据集正是我们旨在检索的目标。我们将这一复杂的优化任务发展成一种实用方法,该方法迭代地应用训练好的自编码器,并结合相对简单的计算来估计和处理未知的退化算子。我们针对盲图像修复场景评估了所提方法,其目标是在未知模式的缺失像素退化条件下恢复训练图像。我们考察了多种深度自编码器架构,如全连接网络和U-Net(采用不同非线性激活函数及多种训练损失值),结果表明我们的方法在从自编码器中恢复训练数据方面显著优于以往方法。重要的是,即使在先前被认为极具挑战性甚至不可行的恢复场景中,我们的方法也大幅提升了恢复性能。