Trade-off Between Dependence and Complexity for Nonparametric Learning -- an Empirical Process Approach

Empirical process theory for i.i.d. observations has emerged as a ubiquitous tool for understanding the generalization properties of various statistical problems. However, in many applications where the data exhibit temporal dependencies (e.g., in finance, medical imaging, weather forecasting etc.), the corresponding empirical processes are much less understood. Motivated by this observation, we present a general bound on the expected supremum of empirical processes under standard $\beta/\rho$-mixing assumptions. Unlike most prior work, our results cover both the long and the short-range regimes of dependence. Our main result shows that a non-trivial trade-off between the complexity of the underlying function class and the dependence among the observations characterizes the learning rate in a large class of nonparametric problems. This trade-off reveals a new phenomenon, namely that even under long-range dependence, it is possible to attain the same rates as in the i.i.d. setting, provided the underlying function class is complex enough. We demonstrate the practical implications of our findings by analyzing various statistical estimators in both fixed and growing dimensions. Our main examples include a comprehensive case study of generalization error bounds in nonparametric regression over smoothness classes in fixed as well as growing dimension using neural nets, shape-restricted multivariate convex regression, estimating the optimal transport (Wasserstein) distance between two probability distributions, and classification under the Mammen-Tsybakov margin condition -- all under appropriate mixing assumptions. In the process, we also develop bounds on $L_r$ ($1\le r\le 2$)-localized empirical processes with dependent observations, which we then leverage to get faster rates for (a) tuning-free adaptation, and (b) set-structured learning problems.

翻译：独立同分布观测数据的经验过程理论已成为理解各类统计问题泛化性质的普遍工具。然而在金融、医学影像、天气预报等许多数据呈现时间依赖性的应用场景中，相应经验过程的理解仍相当有限。受此启发，我们提出了标准β/ρ混合假设下经验过程期望上确界的一般性界。与大多数现有工作不同，我们的结果同时覆盖依赖性长程与短程两种场景。主要结果表明，在广泛非参数问题类别中，基础函数类的复杂度与观测数据间的依赖性之间存在非平凡权衡，这种权衡决定了学习速率。该权衡揭示了一个新现象：即使存在长程依赖性，只要基础函数类足够复杂，仍可能达到与独立同分布设置相同的学习速率。我们通过分析固定维度和增长维度下的各类统计估计器，展示了研究结果的实际意义。主要案例包括：使用神经网络的固定维度和增长维度光滑类非参数回归的泛化误差界综合分析、形状受限多元凸回归、两个概率分布间最优传输（Wasserstein）距离估计，以及Mammen-Tsybakov边界条件下的分类问题——所有分析均在适当的混合假设下进行。在此过程中，我们还发展了依赖观测数据Lr（1≤r≤2）局部化经验过程的界，并利用其获得（a）免调参自适应与（b）集合结构学习问题的更快收敛速率。