训练集非均匀性诱导能力的参数化及其对监督学习的影响 (Parametrising the Inhomogeneity Inducing Capacity of a Training Set, and its Impact on Supervised Learning)

We introduce parametrisation of that property of the available training dataset, that necessitates an inhomogeneous correlation structure for the function that is learnt as a model of the relationship between the pair of variables, observations of which comprise the considered training data. We refer to a parametrisation of this property of a given training set, as its ``inhomogeneity parameter''. It is easy to compute this parameter for small-to-large datasets, and we demonstrate such computation on multiple publicly-available datasets, while also demonstrating that conventional ``non-stationarity'' of data does not imply a non-zero inhomogeneity parameter of the dataset. We prove that - within the probabilistic Gaussian Process-based learning approach - a training set with a non-zero inhomogeneity parameter renders it imperative, that the process that is invoked to model the sought function, be non-stationary. Following the learning of a real-world multivariate function with such a Process, quality and reliability of predictions at test inputs, are demonstrated to be affected by the inhomogeneity parameter of the training data.

翻译：我们引入了可用训练数据集属性的参数化，该属性要求学习到的函数（作为所考虑训练数据中变量对之间关系的模型）具有非均匀的相关结构。我们将给定训练集的这一属性参数化称为其“非均匀性参数”。对于小型到大型数据集，该参数易于计算，我们在多个公开可用的数据集上演示了这种计算，同时也证明了数据的传统“非平稳性”并不意味着数据集的非均匀性参数非零。我们证明，在基于概率高斯过程的学习方法中，具有非零非均匀性参数训练集使得用于建模目标函数的过程必须是非平稳的。通过使用此类过程学习真实世界的多元函数后，我们证明了测试输入处预测的质量和可靠性会受到训练数据非均匀性参数的影响。