Let $f:{\mathbb R}_+\mapsto {\mathbb R}$ be a smooth function with $f(0)=0.$ A problem of estimation of a functional $\tau_f(\Sigma):= {\rm tr}(f(\Sigma))$ of unknown covariance operator $\Sigma$ in a separable Hilbert space ${\mathbb H}$ based on i.i.d. mean zero Gaussian observations $X_1,\dots, X_n$ with values in ${\mathbb H}$ and covariance operator $\Sigma$ is studied. Let $\hat \Sigma_n$ be the sample covariance operator based on observations $X_1,\dots, X_n.$ Estimators \begin{align*} T_{f,m}(X_1,\dots, X_n):= \sum_{j=1}^m C_j \tau_f(\hat \Sigma_{n_j}) \end{align*} based on linear aggregation of several plug-in estimators $\tau_f(\hat \Sigma_{n_j}),$ where the sample sizes $n/c\leq n_1<\dots<n_m\leq n$ and coefficients $C_1,\dots, C_n$ are chosen to reduce the bias, are considered. The complexity of the problem is characterized by the effective rank ${\bf r}(\Sigma):= \frac{{\rm tr}(\Sigma)}{\|\Sigma\|}$ of covariance operator $\Sigma.$ It is shown that, if $f\in C^{m+1}({\mathbb R}_+)$ for some $m\geq 2,$ $\|f''\|_{L_{\infty}}\lesssim 1,$ $\|f^{(m+1)}\|_{L_{\infty}}\lesssim 1,$ $\|\Sigma\|\lesssim 1$ and ${\bf r}(\Sigma)\lesssim n,$ then \begin{align*} & \|\hat T_{f,m}(X_1,\dots, X_n)-\tau_f(\Sigma)\|_{L_2} \lesssim_m \frac{\|\Sigma f'(\Sigma)\|_2}{\sqrt{n}} + \frac{{\bf r}(\Sigma)}{n}+ {\bf r}(\Sigma)\Bigl(\sqrt{\frac{{\bf r}(\Sigma)}{n}}\Bigr)^{m+1}. \end{align*} Similar bounds have been proved for the $L_{p}$-errors and some other Orlicz norm errors of estimator $\hat T_{f,m}(X_1,\dots, X_n).$ The optimality of these error rates, other estimators for which asymptotic efficiency is achieved and uniform bounds over classes of smooth test functions $f$ are also discussed.
翻译:设$f:{\mathbb R}_+\mapsto {\mathbb R}$为一光滑函数且满足$f(0)=0$。研究基于独立同分布的高斯观测值$X_1,\dots, X_n$(取值于可分希尔伯特空间${\mathbb H}$,均值零,协方差算子为$\Sigma$)估计未知协方差算子$\Sigma$的迹泛函$\tau_f(\Sigma):= {\rm tr}(f(\Sigma))$的问题。记$\hat \Sigma_n$为基于观测值$X_1,\dots, X_n$的样本协方差算子。考虑通过多个插值估计量$\tau_f(\hat \Sigma_{n_j})$的线性聚合构造的估计量\begin{align*} T_{f,m}(X_1,\dots, X_n):= \sum_{j=1}^m C_j \tau_f(\hat \Sigma_{n_j}) \end{align*},其中样本量满足$n/c\leq n_1<\dots<n_m\leq n$,系数$C_1,\dots, C_n$用于降低偏差。问题的复杂度由协方差算子$\Sigma$的有效秩${\bf r}(\Sigma):= \frac{{\rm tr}(\Sigma)}{\|\Sigma\|}$刻画。研究表明:若存在$m\geq 2$使得$f\in C^{m+1}({\mathbb R}_+)$,且$\|f''\|_{L_{\infty}}\lesssim 1$,$\|f^{(m+1)}\|_{L_{\infty}}\lesssim 1$,$\|\Sigma\|\lesssim 1$,${\bf r}(\Sigma)\lesssim n$,则\begin{align*} & \|\hat T_{f,m}(X_1,\dots, X_n)-\tau_f(\Sigma)\|_{L_2} \lesssim_m \frac{\|\Sigma f'(\Sigma)\|_2}{\sqrt{n}} + \frac{{\bf r}(\Sigma)}{n}+ {\bf r}(\Sigma)\Bigl(\sqrt{\frac{{\bf r}(\Sigma)}{n}}\Bigr)^{m+1}. \end{align*} 文中进一步证明了估计量$\hat T_{f,m}(X_1,\dots, X_n)$在$L_p$范数及其他奥尔里奇范数下的类似误差上界。同时讨论了这些误差率的最优性、实现渐近有效性的其他估计量以及光滑测试函数类上的统一界。