CS4ML: A general framework for active learning with arbitrary data based on Christoffel functions

We introduce a general framework for active learning in regression problems. Our framework extends the standard setup by allowing for general types of data, rather than merely pointwise samples of the target function. This generalization covers many cases of practical interest, such as data acquired in transform domains (e.g., Fourier data), vector-valued data (e.g., gradient-augmented data), data acquired along continuous curves, and, multimodal data (i.e., combinations of different types of measurements). Our framework considers random sampling according to a finite number of sampling measures and arbitrary nonlinear approximation spaces (model classes). We introduce the concept of generalized Christoffel functions and show how these can be used to optimize the sampling measures. We prove that this leads to near-optimal sample complexity in various important cases. This paper focuses on applications in scientific computing, where active learning is often desirable, since it is usually expensive to generate data. We demonstrate the efficacy of our framework for gradient-augmented learning with polynomials, Magnetic Resonance Imaging (MRI) using generative models and adaptive sampling for solving PDEs using Physics-Informed Neural Networks (PINNs).

翻译：我们提出了一个针对回归问题的通用主动学习框架。该框架将标准设定扩展到支持通用数据类型，而非仅局限于目标函数的逐点采样。这一泛化涵盖了众多实际应用场景，例如变换域数据（如傅里叶数据）、向量值数据（如梯度增强数据）、连续曲线上的采样数据，以及多模态数据（即多种测量类型的组合）。我们的框架基于有限数量的采样测度进行随机采样，并兼容任意非线性逼近空间（模型类）。我们引入了广义Christoffel函数的概念，并展示了如何利用这些函数优化采样测度。理论证明表明，该方法在多种重要情形下可实现近乎最优的样本复杂度。本文聚焦科学计算领域的应用——主动学习在该领域具有显著价值，因为数据生成通常成本高昂。我们通过以下实例验证了框架的有效性：基于多项式的梯度增强学习、基于生成模型的磁共振成像（MRI），以及利用物理信息神经网络（PINNs）求解偏微分方程的自适应采样。

相关内容

主动学习

关注 243

主动学习是机器学习（更普遍的说是人工智能）的一个子领域，在统计学领域也叫查询学习、最优实验设计。“学习模块”和“选择策略”是主动学习算法的2个基本且重要的模块。主动学习是“一种学习方法，在这种方法中，学生会主动或体验性地参与学习过程，并且根据学生的参与程度，有不同程度的主动学习。” （Bonwell＆Eison 1991）Bonwell＆Eison（1991）指出：“学生除了被动地听课以外，还从事其他活动。” 在高等教育研究协会（ASHE）的一份报告中，作者讨论了各种促进主动学习的方法。他们引用了一些文献，这些文献表明学生不仅要做听，还必须做更多的事情才能学习。他们必须阅读，写作，讨论并参与解决问题。此过程涉及三个学习领域，即知识，技能和态度（KSA）。这种学习行为分类法可以被认为是“学习过程的目标”。特别是，学生必须从事诸如分析，综合和评估之类的高级思维任务。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

《多范式建模与仿真：系统工程视角》CMU 2022最新24页slides

专知会员服务

59+阅读 · 2022年11月4日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日