We study the problem of adaptive variable selection in a Gaussian white noise model of intensity $\varepsilon$ under certain sparsity and regularity conditions on an unknown regression function $f$. The $d$-variate regression function $f$ is assumed to be a sum of functions each depending on a smaller number $k$ of variables ($1 \leq k \leq d$). These functions are unknown to us and only few of them are non-zero. We assume that $d=d_\varepsilon \to \infty$ as $\varepsilon \to 0$ and consider the cases when $k$ is fixed and when $k=k_\varepsilon \to \infty$ and $k=o(d)$ as $\varepsilon \to 0$. In this work, we introduce an adaptive selection procedure that, under some model assumptions, identifies exactly all non-zero $k$-variate components of $f$. In addition, we establish conditions under which exact identification of the non-zero components is impossible. These conditions ensure that the proposed selection procedure is the best possible in the asymptotically minimax sense with respect to the Hamming risk.
翻译:我们研究了在强度为$\varepsilon$的高斯白噪声模型中,当未知回归函数$f$满足特定稀疏性和正则性条件时的自适应变量选择问题。假设$d$维回归函数$f$可表示为若干函数的和,每个函数仅依赖较少数量的变量($1 \leq k \leq d$)。这些函数对我们未知,且其中仅少数为非零函数。我们考虑$\varepsilon \to 0$时$d=d_\varepsilon \to \infty$的情形,并分别讨论$k$固定以及$\varepsilon \to 0$时$k=k_\varepsilon \to \infty$且$k=o(d)$的情形。本文提出一种自适应选择程序,在一定的模型假设下,该程序能精确识别$f$中所有非零的$k$变量分量。此外,我们建立了无法精确识别非零分量的条件。这些条件确保所提出的选择程序在汉明风险意义下具有渐近极小化最优性。