Suppose we observe a trajectory of length $n$ from an $\alpha$-mixing stochastic process over a finite but potentially large state space. We consider the problem of estimating the probability mass placed by the stationary distribution of any such process on elements that occur with a certain frequency in the observed sequence. We estimate this vector of probabilities in total variation distance, showing universal consistency in $n$ and recovering known results for i.i.d. sequences as special cases. Our proposed methodology carefully combines the plug-in (or empirical) estimator with a recently-proposed modification of the Good--Turing estimator called WingIt, which was originally developed for Markovian sequences. En route to controlling the error of our estimator, we develop new performance bounds on WingIt and the plug-in estimator for $\alpha$-mixing stochastic processes. Importantly, the extensively used method of Poissonization can no longer be applied in our non i.i.d. setting, and so we develop complementary tools -- including concentration inequalities for a natural self-normalized statistic of mixing sequences -- that may prove independently useful in the design and analysis of estimators for related problems.
翻译:假设我们观测到来自有限但可能较大的状态空间上一个α混合随机过程的一段长度为n的轨迹。我们考虑估计该过程的平稳分布对观测序列中以特定频率出现的元素所赋予的概率质量的问题。我们以全变差距离估计这一概率向量,证明了关于n的普遍一致性,并将独立同分布序列的已知结果作为特例进行恢复。我们提出的方法将插件(或经验)估计量与最近提出的WingIt估计量(Good-Turing估计量的改进版本,最初为马尔可夫序列开发)进行谨慎结合。在控制估计误差的过程中,我们为α混合随机过程建立了WingIt和插件估计量的新性能界。重要的是,广泛使用的泊松化方法在此非独立同分布场景中不再适用,因此我们发展了补充工具——包括对混合序列自然自归一化统计量的集中不等式——这些工具可能独立地用于相关问题的估计器设计与分析中。