Experimental design is a classical statistics problem and its aim is to estimate an unknown $m$-dimensional vector $\beta$ from linear measurements where a Gaussian noise is introduced in each measurement. For the combinatorial experimental design problem, the goal is to pick $k$ out of the given $n$ experiments so as to make the most accurate estimate of the unknown parameters, denoted as $\hat{\beta}$. In this paper, we will study one of the most robust measures of error estimation - $D$-optimality criterion, which corresponds to minimizing the volume of the confidence ellipsoid for the estimation error $\beta-\hat{\beta}$. The problem gives rise to two natural variants depending on whether repetitions of experiments are allowed or not. We first propose an approximation algorithm with a $\frac1e$-approximation for the $D$-optimal design problem with and without repetitions, giving the first constant factor approximation for the problem. We then analyze another sampling approximation algorithm and prove that it is $(1-\epsilon)$-approximation if $k\geq \frac{4m}{\epsilon}+\frac{12}{\epsilon^2}\log(\frac{1}{\epsilon})$ for any $\epsilon \in (0,1)$. Finally, for $D$-optimal design with repetitions, we study a different algorithm proposed by literature and show that it can improve this asymptotic approximation ratio.
翻译:实验设计是一个经典的统计学问题,其目标是从线性测量中估计一个未知的 $m$ 维向量 $\beta$,其中每次测量都引入了高斯噪声。对于组合实验设计问题,目标是从给定的 $n$ 个实验中选取 $k$ 个,以便对未知参数 $\hat{\beta}$ 做出最准确的估计。本文研究最稳健的误差估计度量之一——$D$-最优性准则,该准则对应于最小化估计误差 $\beta-\hat{\beta}$ 的置信椭球体积。根据是否允许实验重复,该问题自然衍生出两个变体。我们首先提出一种近似算法,对于允许和不允许重复的 $D$-最优设计问题均达到 $\frac1e$ 的近似比,为该问题提供了首个常数因子近似解。随后,我们分析了另一种抽样近似算法,并证明对于任意 $\epsilon \in (0,1)$,当 $k\geq \frac{4m}{\epsilon}+\frac{12}{\epsilon^2}\log(\frac{1}{\epsilon})$ 时,该算法具有 $(1-\epsilon)$ 的近似比。最后,针对允许重复的 $D$-最优设计,我们研究了文献提出的另一种算法,证明其能够改进该渐近近似比。