We introduce a deep learning model that can universally approximate regular conditional distributions (RCDs). The proposed model operates in three phases: first, it linearizes inputs from a given metric space $\mathcal{X}$ to $\mathbb{R}^d$ via a feature map, then a deep feedforward neural network processes these linearized features, and then the network's outputs are then transformed to the $1$-Wasserstein space $\mathcal{P}_1(\mathbb{R}^D)$ via a probabilistic extension of the attention mechanism of Bahdanau et al.\ (2014). Our model, called the \textit{probabilistic transformer (PT)}, can approximate any continuous function from $\mathbb{R}^d $ to $\mathcal{P}_1(\mathbb{R}^D)$ uniformly on compact sets, quantitatively. We identify two ways in which the PT avoids the curse of dimensionality when approximating $\mathcal{P}_1(\mathbb{R}^D)$-valued functions. The first strategy builds functions in $C(\mathbb{R}^d,\mathcal{P}_1(\mathbb{R}^D))$ which can be efficiently approximated by a PT, uniformly on any given compact subset of $\mathbb{R}^d$. In the second approach, given any function $f$ in $C(\mathbb{R}^d,\mathcal{P}_1(\mathbb{R}^D))$, we build compact subsets of $\mathbb{R}^d$ whereon $f$ can be efficiently approximated by a PT.
翻译:我们提出了一种能够通用逼近正则条件分布(RCD)的深度学习模型。该模型分三个阶段运行:首先,通过特征映射将给定度量空间$\mathcal{X}$中的输入线性化为$\mathbb{R}^d$;随后,一个深度前馈神经网络处理这些线性化特征;最后,通过Bahdanau等人(2014)注意力机制的概率扩展,将网络输出变换到$1$-Wasserstein空间$\mathcal{P}_1(\mathbb{R}^D)$中。我们的模型称为**概率变换器(PT)**,能够定量地在紧集上一致逼近从$\mathbb{R}^d$到$\mathcal{P}_1(\mathbb{R}^D)$的任意连续函数。我们揭示了PT在逼近$\mathcal{P}_1(\mathbb{R}^D)$值函数时避免维数灾难的两种途径:第一种策略构造了$C(\mathbb{R}^d,\mathcal{P}_1(\mathbb{R}^D))$中可被PT高效逼近的函数,且一致逼近在$\mathbb{R}^d$的任意给定紧子集上成立;第二种方法中,对任意函数$f\in C(\mathbb{R}^d,\mathcal{P}_1(\mathbb{R}^D))$,我们构建了$\mathbb{R}^d$中可使$f$被PT高效逼近的紧子集。