Covariate shift in regression problems and the associated distribution mismatch between training and test data is a commonly encountered phenomenon in machine learning. In this paper, we extend recent results on nonparametric convergence rates for i.i.d. data to Markovian dependence structures. We demonstrate that under H\"older smoothness assumptions on the regression function, convergence rates for the generalization risk of a Nadaraya-Watson kernel estimator are determined by the similarity between the invariant distributions associated to source and target Markov chains. The similarity is explicitly captured in terms of a bandwidth-dependent similarity measure recently introduced in Pathak, Ma and Wainwright [ICML, 2022]. Precise convergence rates are derived for the particular cases of finite Markov chains and spectral gap Markov chains for which the similarity measure between their invariant distributions grows polynomially with decreasing bandwidth. For the latter, we extend the notion of a distribution transfer exponent from Kpotufe and Martinet [Ann. Stat., 49(6), 2021] to kernel transfer exponents of uniformly ergodic Markov chains in order to generate a rich class of Markov kernel pairs for which convergence guarantees for the covariate shift problem can be formulated.
翻译:回归问题中的协变量偏移以及训练数据与测试数据之间的分布不匹配是机器学习中常见的现象。本文将对独立同分布数据的非参数收敛速率的最新结果扩展至马尔可夫依赖结构。我们证明,在回归函数满足赫尔德平滑性假设的条件下,纳达拉亚-沃森核估计器泛化风险的收敛速率由源马尔可夫链和目标马尔可夫链相关的不变分布之间的相似性决定。该相似性通过Pathak、Ma和Wainwright [ICML, 2022] 最近引入的带宽依赖相似性度量显式刻画。针对有限马尔可夫链和谱间隙马尔可夫链这两种特例,推导了精确收敛速率,其中不变分布间的相似性度量随带宽减小呈多项式增长。对于后者,我们将Kpotufe和Martinet [Ann. Stat., 49(6), 2021] 提出的分布迁移指数概念扩展至一致遍历马尔可夫链的核迁移指数,从而构建出一类丰富的马尔可夫核对,使得协变量偏移问题的收敛保证得以公式化。