We introduce a general framework for active learning in regression problems. Our framework extends the standard setup by allowing for general types of data, rather than merely pointwise samples of the target function. This generalization covers many cases of practical interest, such as data acquired in transform domains (e.g., Fourier data), vector-valued data (e.g., gradient-augmented data), data acquired along continuous curves, and, multimodal data (i.e., combinations of different types of measurements). Our framework considers random sampling according to a finite number of sampling measures and arbitrary nonlinear approximation spaces (model classes). We introduce the concept of generalized Christoffel functions and show how these can be used to optimize the sampling measures. We prove that this leads to near-optimal sample complexity in various important cases. This paper focuses on applications in scientific computing, where active learning is often desirable, since it is usually expensive to generate data. We demonstrate the efficacy of our framework for gradient-augmented learning with polynomials, Magnetic Resonance Imaging (MRI) using generative models and adaptive sampling for solving PDEs using Physics-Informed Neural Networks (PINNs).
翻译:我们提出了一个适用于回归问题的主动学习通用框架。该框架将标准设置扩展至允许处理一般类型的数据,而不仅仅是目标函数的逐点样本。这一推广涵盖了许多实际应用场景,例如在变换域中获取的数据(如傅里叶数据)、向量值数据(如梯度增强数据)、沿连续曲线获取的数据以及多模态数据(即不同类型测量结果的组合)。本框架考虑根据有限个采样测度进行随机采样,并采用任意非线性逼近空间(模型类)。我们引入广义Christoffel函数的概念,并展示如何利用这些函数优化采样测度。我们证明,在多个重要情形下,该方法可实现近乎最优的样本复杂度。本文聚焦于科学计算领域的应用,该领域因数据生成成本高昂而常需采用主动学习策略。我们通过多项式梯度增强学习、基于生成模型的磁共振成像(MRI)以及利用物理信息神经网络(PINNs)求解偏微分方程的自适应采样,验证了本框架的有效性。