Least-squares approximation is one of the most important methods for recovering an unknown function from data. While in many applications the data is fixed, in many others there is substantial freedom to choose where to sample. In this paper, we review recent progress on optimal sampling for (weighted) least-squares approximation in arbitrary linear spaces. We introduce the Christoffel function as a key quantity in the analysis of (weighted) least-squares approximation from random samples, then show how it can be used to construct sampling strategies that possess near-optimal sample complexity: namely, the number of samples scales log-linearly in $n$, the dimension of the approximation space. We discuss a series of variations, extensions and further topics, and throughout highlight connections to approximation theory, machine learning, information-based complexity and numerical linear algebra. Finally, motivated by various contemporary applications, we consider a generalization of the classical setting where the samples need not be pointwise samples of a scalar-valued function, and the approximation space need not be linear. We show that even in this significantly more general setting suitable generalizations of the Christoffel function still determine the sample complexity. This provides a unified procedure for designing improved sampling strategies for general recovery problems. This article is largely self-contained, and intended to be accessible to nonspecialists.
翻译:最小二乘逼近是从数据中恢复未知函数的最重要方法之一。尽管在许多应用中数据是固定的,但在许多其他应用中,选择采样点具有相当大的自由度。本文回顾了任意线性空间中(加权)最小二乘逼近的最优采样最新进展。我们引入Christoffel函数作为分析随机采样(加权)最小二乘逼近的关键量,随后展示如何利用它构建具有近最优样本复杂度的采样策略:即样本数量以对数线性方式随逼近空间维度$n$缩放。我们讨论了一系列变体、扩展和进一步主题,并始终强调与逼近理论、机器学习、基于信息的复杂性以及数值线性代数的联系。最后,受各种当代应用驱动,我们考虑经典设置的推广,其中样本不必是标量值函数的逐点采样,且逼近空间不必是线性的。我们证明,即使在这种显著更一般的设置中,Christoffel函数的适当推广仍然决定了样本复杂度。这为设计通用恢复问题的改进采样策略提供了统一流程。本文基本自成体系,旨在使非专业人士也能理解。