We derive general bounds on the probability that the empirical first-passage time $\overline{\tau}_n\equiv \sum_{i=1}^n\tau_i/n$ of a reversible ergodic Markov process inferred from a sample of $n$ independent realizations deviates from the true mean first-passage time by more than any given amount in either direction. We construct non-asymptotic confidence intervals that hold in the elusive small-sample regime and thus fill the gap between asymptotic methods and the Bayesian approach that is known to be sensitive to prior belief and tends to underestimate uncertainty in the small-sample setting. We prove sharp bounds on extreme first-passage times that control uncertainty even in cases where the mean alone does not sufficiently characterize the statistics. Our concentration-of-measure-based results allow for model-free error control and reliable error estimation in kinetic inference, and are thus important for the analysis of experimental and simulation data in the presence of limited sampling.
翻译:针对可逆遍历马尔可夫过程,我们推导了基于$n$个独立实现样本推断得到的经验首次通过时间$\overline{\tau}_n\equiv \sum_{i=1}^n\tau_i/n$偏离真实平均首次通过时间任意给定方向与幅度的概率的通用上界。我们构建了适用于困难小样本区间的非渐近置信区间,从而填补了渐近方法与贝叶斯方法之间的空白——后者对先验信念敏感且在小样本情形下易低估不确定性。我们证明了极值首次通过时间的紧界,这种界定即使在仅靠均值不足以充分刻画统计特性的情形下仍能控制不确定性。基于测度集中化的结果实现了无模型误差控制与动力学推断中的可靠误差估计,因而对有限采样条件下实验与模拟数据的分析具有重要意义。