We explore the theoretical possibility of learning $d$-dimensional targets with $W$-parameter models by gradient flow (GF) when $W<d$. Our main result shows that if the targets are described by a particular $d$-dimensional probability distribution, then there exist models with as few as two parameters that can learn the targets with arbitrarily high success probability. On the other hand, we show that for $W<d$ there is necessarily a large subset of GF-non-learnable targets. In particular, the set of learnable targets is not dense in $\mathbb R^d$, and any subset of $\mathbb R^d$ homeomorphic to the $W$-dimensional sphere contains non-learnable targets. Finally, we observe that the model in our main theorem on almost guaranteed two-parameter learning is constructed using a hierarchical procedure and as a result is not expressible by a single elementary function. We show that this limitation is essential in the sense that such learnability can be ruled out for a large class of elementary functions.
翻译:我们探讨了在参数数量$W$小于目标维度$d$的情况下,使用梯度流学习$d$维目标的理论可能性。主要结果表明,若目标服从特定的$d$维概率分布,则存在仅含两个参数的模型能够以任意高的成功概率学习这些目标。另一方面,我们证明当$W<d$时,必然存在大量梯度流不可学习的目标子集。特别地,可学习目标集在$\mathbb R^d$中非稠密,且任何与$W$维球面同胚的$\mathbb R^d$子集均包含不可学习目标。最后,我们注意到主定理中几乎可保证的双参数学习模型是通过分层过程构造的,因此无法用单一初等函数表示。我们证明这一限制具有本质性——对于一大类初等函数,此类可学习性可以被排除。