We study the problem of adaptive variable selection in a Gaussian white noise model of intensity $\varepsilon$ under certain sparsity and regularity conditions on an unknown regression function $f$. The $d$-variate regression function $f$ is assumed to be a sum of functions each depending on a smaller number $k$ of variables ($1 \leq k \leq d$). These functions are unknown to us and only few of them are nonzero. We assume that $d=d_\varepsilon \to \infty$ as $\varepsilon \to 0$ and consider the cases when $k$ is fixed and when $k=k_\varepsilon \to \infty$, $k=o(d)$ as $\varepsilon \to 0$. In this work, we introduce an adaptive selection procedure that, under some model assumptions, identifies exactly all nonzero $k$-variate components of $f$. In addition, we establish conditions under which exact identification of the nonzero components is impossible. These conditions ensure that the proposed selection procedure is the best possible in the asymptotically minimax sense with respect to the Hamming risk.
翻译:我们研究了在服从强度为$\varepsilon$的高斯白噪声模型中,当未知回归函数$f$满足特定稀疏性和正则性条件时的自适应变量选择问题。假设$d$维回归函数$f$可表示为多个函数的和,每个函数仅依赖于较少数量$k$个变量($1 \leq k \leq d$)。这些函数对我们是未知的,且仅有少数函数非零。我们考虑当$\varepsilon \to 0$时$d = d_\varepsilon \to \infty$的情形,并分别讨论了$k$固定以及当$\varepsilon \to 0$时$k = k_\varepsilon \to \infty$且$k = o(d)$的情况。本文提出了一种自适应选择过程,在特定模型假设下,该过程能精确识别$f$中所有非零的$k$元分量。此外,我们建立了非零分量无法被精确识别的条件。这些条件确保了所提出的选择过程在Hamming风险意义下具有渐近极小化最优性。