Overlapping Batch Confidence Intervals on Statistical Functionals Constructed from Time Series: Application to Quantiles, Optimization, and Estimation

2023 年 7 月 17 日

翻译：时间序列统计泛函的重叠批次置信区间：应用于分位数、优化与估计

Ziwei Su,Raghu Pasupathy,Yingchieh Yeh,Peter W. Glynn

from arxiv, 43 pages, 4 figures

We propose a general purpose confidence interval procedure (CIP) for statistical functionals constructed using data from a stationary time series. The procedures we propose are based on derived distribution-free analogues of the $\chi^2$ and Student's $t$ random variables for the statistical functional context, and hence apply in a wide variety of settings including quantile estimation, gradient estimation, M-estimation, CVAR-estimation, and arrival process rate estimation, apart from more traditional statistical settings. Like the method of subsampling, we use overlapping batches of time series data to estimate the underlying variance parameter; unlike subsampling and the bootstrap, however, we assume that the implied point estimator of the statistical functional obeys a central limit theorem (CLT) to help identify the weak asymptotics (called OB-x limits, x=I,II,III) of batched Studentized statistics. The OB-x limits, certain functionals of the Wiener process parameterized by the size of the batches and the extent of their overlap, form the essential machinery for characterizing dependence, and consequently the correctness of the proposed CIPs. The message from extensive numerical experimentation is that in settings where a functional CLT on the point estimator is in effect, using \emph{large overlapping batches} alongside OB-x critical values yields confidence intervals that are often of significantly higher quality than those obtained from more generic methods like subsampling or the bootstrap. We illustrate using examples from CVaR estimation, ARMA parameter estimation, and NHPP rate estimation; R and MATLAB code for OB-x critical values is available at~\texttt{web.ics.purdue.edu/~pasupath/}.

翻译：我们提出了一种针对平稳时间序列数据构建统计泛函的通用置信区间程序（CIP）。该程序基于统计泛函情境下χ²分布与Student t分布的导出无分布类比量，因此可广泛应用于分位数估计、梯度估计、M估计、CVaR估计及到达过程速率估计等多种场景，并超越传统统计框架。与子抽样方法类似，我们通过重叠的时间序列数据批次来估计潜在方差参数；然而，不同于子抽样与自助法，我们假设统计泛函的隐含点估计量服从中心极限定理（CLT），以识别分块化Student化统计量的弱渐近性（称为OB-x极限，类型I/II/III）。OB-x极限作为由批次大小及其重叠程度参数化的维纳过程泛函，构成了刻画依赖性的核心机制，进而保障了所提CIP的正确性。大量数值实验表明：在点估计量满足泛函CLT的条件下，采用大规模重叠批次配合OB-x临界值生成的置信区间，其质量往往显著优于通过子抽样或自助法等通用方法获得的结果。我们通过CVaR估计、ARMA参数估计及NHPP速率估计实例进行验证；OB-x临界值的R与MATLAB代码可通过~\texttt{web.ics.purdue.edu/~pasupath/}获取。