We introduce a general framework for active learning in regression problems. Our framework extends the standard setup by allowing for general types of data, rather than merely pointwise samples of the target function. This generalization covers many cases of practical interest, such as data acquired in transform domains (e.g., Fourier data), vector-valued data (e.g., gradient-augmented data), data acquired along continuous curves, and, multimodal data (i.e., combinations of different types of measurements). Our framework considers random sampling according to a finite number of sampling measures and arbitrary nonlinear approximation spaces (model classes). We introduce the concept of generalized Christoffel functions and show how these can be used to optimize the sampling measures. We prove that this leads to near-optimal sample complexity in various important cases. This paper focuses on applications in scientific computing, where active learning is often desirable, since it is usually expensive to generate data. We demonstrate the efficacy of our framework for gradient-augmented learning with polynomials, Magnetic Resonance Imaging (MRI) using generative models and adaptive sampling for solving PDEs using Physics-Informed Neural Networks (PINNs).
翻译:我们提出了一个针对回归问题的通用主动学习框架。该框架将标准设定扩展到支持通用数据类型,而非仅局限于目标函数的逐点采样。这一泛化涵盖了众多实际应用场景,例如变换域数据(如傅里叶数据)、向量值数据(如梯度增强数据)、连续曲线上的采样数据,以及多模态数据(即多种测量类型的组合)。我们的框架基于有限数量的采样测度进行随机采样,并兼容任意非线性逼近空间(模型类)。我们引入了广义Christoffel函数的概念,并展示了如何利用这些函数优化采样测度。理论证明表明,该方法在多种重要情形下可实现近乎最优的样本复杂度。本文聚焦科学计算领域的应用——主动学习在该领域具有显著价值,因为数据生成通常成本高昂。我们通过以下实例验证了框架的有效性:基于多项式的梯度增强学习、基于生成模型的磁共振成像(MRI),以及利用物理信息神经网络(PINNs)求解偏微分方程的自适应采样。