Uniform laws of large numbers form a cornerstone of Vapnik--Chervonenkis theory, where they are characterized by the finiteness of the VC dimension. In this work, we study uniform convergence phenomena in cartesian product spaces, under assumptions on the underlying distribution that are compatible with the product structure. Specifically, we assume that the distribution is absolutely continuous with respect to the product of its marginals, a condition that captures many natural settings, including product distributions, sparse mixtures of product distributions, distributions with low mutual information, and more. We show that, under this assumption, a uniform law of large numbers holds for a family of events if and only if the linear VC dimension of the family is finite. The linear VC dimension is defined as the maximum size of a shattered set that lies on an axis-parallel line, namely, a set of vectors that agree on all but at most one coordinate. This dimension is always at most the classical VC dimension, yet it can be arbitrarily smaller. For instance, the family of convex sets in $\mathbb{R}^d$ has linear VC dimension $2$, while its VC dimension is infinite already for $d\ge 2$. Our proofs rely on estimator that departs substantially from the standard empirical mean estimator and exhibits more intricate structure. We show that such deviations from the standard empirical mean estimator are unavoidable in this setting. Throughout the paper, we propose several open questions, with a particular focus on quantitative sample complexity bounds.
翻译:一致大数定律构成Vapnik-Chervonenkis理论的基础,在该理论中其由VC维的有限性刻画。本文研究笛卡尔积空间中的一致收敛现象,所依据的底层分布假设与积结构相容。具体而言,我们假设分布关于其边缘分布的乘积绝对连续,这一条件涵盖了许多自然场景,包括乘积分布、稀疏混合乘积分布、低互信息分布等。我们证明,在该假设下,事件族满足一致大数定律当且仅当该事件族的线性VC维有限。线性VC维定义为位于坐标轴平行直线上可粉碎集的最大尺寸,即至多一个坐标不同的向量集合的基数。该维数始终不超过经典VC维,但可任意小。例如,$\mathbb{R}^d$中凸集族的线性VC维为$2$,而其VC维在$d\ge 2$时已为无穷。我们的证明依赖于显著偏离标准经验均值估计量的估计方法,展现出更复杂的结构。我们证明这种对标准经验均值估计量的偏离在该设定下不可避免。贯穿全文,我们提出了多个开放性问题,特别关注量化样本复杂度的界。