Beyond Moments: Robustly Learning Affine Transformations with Asymptotically Optimal Error

We present a polynomial-time algorithm for robustly learning an unknown affine transformation of the standard hypercube from samples, an important and well-studied setting for independent component analysis (ICA). Specifically, given an $\epsilon$-corrupted sample from a distribution $D$ obtained by applying an unknown affine transformation $x \rightarrow Ax+s$ to the uniform distribution on a $d$-dimensional hypercube $[-1,1]^d$, our algorithm constructs $\hat{A}, \hat{s}$ such that the total variation distance of the distribution $\hat{D}$ from $D$ is $O(\epsilon)$ using poly$(d)$ time and samples. Total variation distance is the information-theoretically strongest possible notion of distance in our setting and our recovery guarantees in this distance are optimal up to the absolute constant factor multiplying $\epsilon$. In particular, if the columns of $A$ are normalized to be unit length, our total variation distance guarantee implies a bound on the sum of the $\ell_2$ distances between the column vectors of $A$ and $A'$, $\sum_{i =1}^d \|a_i-\hat{a}_i\|_2 = O(\epsilon)$. In contrast, the strongest known prior results only yield a $\epsilon^{O(1)}$ (relative) bound on the distance between individual $a_i$'s and their estimates and translate into an $O(d\epsilon)$ bound on the total variation distance. Our key innovation is a new approach to ICA (even to outlier-free ICA) that circumvents the difficulties in the classical method of moments and instead relies on a new geometric certificate of correctness of an affine transformation. Our algorithm is based on a new method that iteratively improves an estimate of the unknown affine transformation whenever the requirements of the certificate are not met.

翻译：我们提出了一种多项式时间算法，用于从样本中鲁棒学习标准超立方体的未知仿射变换，这是独立成分分析(ICA)中一个重要且被广泛研究的场景。具体而言，给定一个来自分布$D$的$\epsilon$-污染样本，该分布通过对$d$维超立方体$[-1,1]^d$上的均匀分布施加未知仿射变换$x \rightarrow Ax+s$得到，我们的算法构造$\hat{A}, \hat{s}$，使得分布$\hat{D}$与$D$之间的总变差距离为$O(\epsilon)$，所需时间和样本量为poly$(d)$。总变差距离是我们设定中信息论意义上最强的距离度量，而我们在该距离下的恢复保证在乘以$\epsilon$的绝对常数因子意义下是最优的。特别地，若$A$的列被归一化为单位长度，我们的总变差距离保证意味着$A$与$A'$的列向量之间的$\ell_2$距离之和有界：$\sum_{i =1}^d \|a_i-\hat{a}_i\|_2 = O(\epsilon)$。相比之下，已知最强的先前结果仅能对单个$a_i$与其估计之间的距离给出$\epsilon^{O(1)}$（相对）界，并转化为总变差距离上的$O(d\epsilon)$界。我们的关键创新在于一种新的ICA方法（甚至适用于无离群点的ICA），它规避了经典矩量法的困难，转而依赖一种新的仿射变换正确性几何证书。我们的算法基于一种新方法，当证书要求未满足时，该方法会迭代改进未知仿射变换的估计。