Recovery of Training Data from Overparameterized Autoencoders: An Inverse Problem Perspective

We study the recovery of training data from overparameterized autoencoder models. Given a degraded training sample, we define the recovery of the original sample as an inverse problem and formulate it as an optimization task. In our inverse problem, we use the trained autoencoder to implicitly define a regularizer for the particular training dataset that we aim to retrieve from. We develop the intricate optimization task into a practical method that iteratively applies the trained autoencoder and relatively simple computations that estimate and address the unknown degradation operator. We evaluate our method for blind inpainting where the goal is to recover training images from degradation of many missing pixels in an unknown pattern. We examine various deep autoencoder architectures, such as fully connected and U-Net (with various nonlinearities and at diverse train loss values), and show that our method significantly outperforms previous methods for training data recovery from autoencoders. Importantly, our method greatly improves the recovery performance also in settings that were previously considered highly challenging, and even impractical, for such retrieval.

翻译：我们研究从超参数化自编码器模型中恢复训练数据的问题。给定一个退化的训练样本，我们将原始样本的恢复定义为逆问题，并将其构建为优化任务。在该逆问题中，我们利用训练好的自编码器隐式地定义针对特定训练数据集的正则化项，该数据集正是我们旨在检索的目标。我们将这一复杂的优化任务发展成一种实用方法，该方法迭代地应用训练好的自编码器，并结合相对简单的计算来估计和处理未知的退化算子。我们针对盲图像修复场景评估了所提方法，其目标是在未知模式的缺失像素退化条件下恢复训练图像。我们考察了多种深度自编码器架构，如全连接网络和U-Net（采用不同非线性激活函数及多种训练损失值），结果表明我们的方法在从自编码器中恢复训练数据方面显著优于以往方法。重要的是，即使在先前被认为极具挑战性甚至不可行的恢复场景中，我们的方法也大幅提升了恢复性能。

相关内容

自编码器

关注 141

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日