Let $X$ be a $p$-variate random vector and $\widetilde{X}$ a knockoff copy of $X$ (in the sense of \cite{CFJL18}). A new approach for constructing $\widetilde{X}$ (henceforth, NA) has been introduced in \cite{JSPI}. NA has essentially three advantages: (i) To build $\widetilde{X}$ is straightforward; (ii) The joint distribution of $(X,\widetilde{X})$ can be written in closed form; (iii) $\widetilde{X}$ is often optimal under various criteria. However, for NA to apply, $X_1,\ldots, X_p$ should be conditionally independent given some random element $Z$. Our first result is that any probability measure $\mu$ on $\mathbb{R}^p$ can be approximated by a probability measure $\mu_0$ of the form $$\mu_0\bigl(A_1\times\ldots\times A_p\bigr)=E\Bigl\{\prod_{i=1}^p P(X_i\in A_i\mid Z)\Bigr\}.$$ The approximation is in total variation distance when $\mu$ is absolutely continuous, and an explicit formula for $\mu_0$ is provided. If $X\sim\mu_0$, then $X_1,\ldots,X_p$ are conditionally independent. Hence, with a negligible error, one can assume $X\sim\mu_0$ and build $\widetilde{X}$ through NA. Our second result is a characterization of the knockoffs $\widetilde{X}$ obtained via NA. It is shown that $\widetilde{X}$ is of this type if and only if the pair $(X,\widetilde{X})$ can be extended to an infinite sequence so as to satisfy certain invariance conditions. The basic tool for proving this fact is de Finetti's theorem for partially exchangeable sequences. In addition to the quoted results, an explicit formula for the conditional distribution of $\widetilde{X}$ given $X$ is obtained in a few cases. In one of such cases, it is assumed $X_i\in\{0,1\}$ for all $i$.
翻译:设 $X$ 为 $p$ 维随机向量,$\widetilde{X}$ 为 $X$ 的 knockoff 副本(参见 \cite{CFJL18})。文献 \cite{JSPI} 引入了一种构造 $\widetilde{X}$ 的新方法(以下简称 NA)。NA 具有三个主要优点:(i) 构建 $\widetilde{X}$ 步骤简便;(ii) $(X,\widetilde{X})$ 的联合分布可显式表达;(iii) 在多种准则下 $\widetilde{X}$ 通常具有最优性。然而,NA 的应用要求 $X_1,\ldots, X_p$ 在给定某个随机元 $Z$ 时条件独立。我们的首个结论是:$\mathbb{R}^p$ 上的任意概率测度 $\mu$ 均可被具有如下形式的概率测度 $\mu_0$ 逼近:$$\mu_0\bigl(A_1\times\ldots\times A_p\bigr)=E\Bigl\{\prod_{i=1}^p P(X_i\in A_i\mid Z)\Bigr\}.$$ 当 $\mu$ 绝对连续时,该逼近是在全变差距离意义下成立的,且我们给出了 $\mu_0$ 的显式表达式。若 $X\sim\mu_0$,则 $X_1,\ldots,X_p$ 条件独立。因此,在可忽略的误差范围内,可假设 $X\sim\mu_0$ 并通过 NA 构建 $\widetilde{X}$。第二个结论是对通过 NA 获得的 knockoff $\widetilde{X}$ 的特征刻画。结果表明:$\widetilde{X}$ 属于该类型当且仅当 $(X,\widetilde{X})$ 可延拓为满足特定不变性条件的无穷序列。证明此结论的基本工具是部分可交换序列的 de Finetti 定理。除上述结果外,本文还在若干情形下给出了 $\widetilde{X}$ 关于 $X$ 的条件分布的显式公式,其中一种情形假设所有 $X_i\in\{0,1\}$。