We explore the theoretical possibility of learning $d$-dimensional targets with $W$-parameter models by gradient flow (GF) when $W<d$. Our main result shows that if the targets are described by a particular $d$-dimensional probability distribution, then there exist models with as few as two parameters that can learn the targets with arbitrarily high success probability. On the other hand, we show that for $W<d$ there is necessarily a large subset of GF-non-learnable targets. In particular, the set of learnable targets is not dense in $\mathbb R^d$, and any subset of $\mathbb R^d$ homeomorphic to the $W$-dimensional sphere contains non-learnable targets. Finally, we observe that the model in our main theorem on almost guaranteed two-parameter learning is constructed using a hierarchical procedure and as a result is not expressible by a single elementary function. We show that this limitation is essential in the sense that most models written in terms of elementary functions cannot achieve the learnability demonstrated in this theorem.
翻译:本文探讨了当参数数量 $W$ 小于目标维度 $d$ 时,使用 $W$ 参数模型通过梯度流(GF)学习 $d$ 维目标的理论可能性。我们的主要结果表明,如果目标由某个特定的 $d$ 维概率分布描述,则存在仅需两个参数的模型,能够以任意高的成功概率学习这些目标。另一方面,我们证明当 $W<d$ 时,必然存在一个较大的 GF 不可学习目标子集。具体而言,可学习目标集合在 $\mathbb R^d$ 中不是稠密的,并且任何与 $W$ 维球面同胚的 $\mathbb R^d$ 子集都包含不可学习的目标。最后,我们指出,在主要定理中关于几乎保证双参数学习的模型是通过分层过程构建的,因此无法用单个初等函数表示。我们证明这一限制是本质性的,因为大多数用初等函数表达的模型无法达到该定理所展示的可学习性。