We present several generative and predictive algorithms based on the RKHS (reproducing kernel Hilbert spaces) methodology, which, most importantly, are scale up efficiently with large datasets or high-dimensional data. It is well recognized that the RKHS methodology leads one to efficient and robust algorithms for numerous tasks in data science, statistics, and scientific computation. However, the implementations existing the literature are often difficult to scale up for encompassing large datasets. In this paper, we introduce a simple and robust, divide-and-conquer methodology. It applies to large scale datasets and relies on several kernel-based algorithms, which distinguish between various extrapolation, interpolation, and optimal transport steps. We argue how to select the suitable algorithm in specific applications thanks to a feedback of performance criteria. Our primary focus is on applications and problems arising in industrial contexts, such as generating meshes for efficient numerical simulations, designing generators for conditional distributions, constructing transition probability matrices for statistical or stochastic applications, and addressing various tasks relevant to the Artificial Intelligence community. The proposed algorithms are highly relevant to supervised and unsupervised learning, generative methods, as well as reinforcement learning.
翻译:我们提出了几种基于再生核希尔伯特空间方法的生成与预测算法,其最重要特性在于能够高效地适应大规模数据集或高维数据。学界普遍认为,再生核希尔伯特空间方法能为数据科学、统计学和科学计算中的众多任务提供高效稳健的算法。然而,现有文献中的实现方案往往难以扩展至大规模数据集。本文提出一种简洁稳健的分治方法,该方法适用于大规模数据集,并依托多种基于核的算法,这些算法能区分外推、内插与最优传输等不同计算步骤。我们通过性能指标的反馈机制,论证了如何在具体应用场景中选择合适的算法。本文主要关注工业场景中产生的应用与问题,例如:为高效数值模拟生成网格、设计条件分布生成器、构建统计或随机应用中的转移概率矩阵,以及解决人工智能领域相关的各类任务。所提出的算法与监督学习、无监督学习、生成方法及强化学习均具有高度相关性。