Improvements in technology lead to increasing availability of large data sets which makes the need for data reduction and informative subsamples ever more important. In this paper we construct $ D $-optimal subsampling designs for polynomial regression in one covariate for invariant distributions of the covariate. We study quadratic regression more closely for specific distributions. In particular we make statements on the shape of the resulting optimal subsampling designs and the effect of the subsample size on the design. To illustrate the advantage of the optimal subsampling designs we examine the efficiency of uniform random subsampling.
翻译:技术进步导致大型数据集日益可得,这使得数据降维和信息性子采样需求愈发重要。本文针对协变量不变分布情形,构建了单协变量多项式回归的$D$-最优子采样设计。我们更深入地研究了特定分布下的二次回归,具体阐述了所得到的最优子采样设计的形态特征以及子样本量对设计的影响。为展示最优子采样设计的优势,我们检验了均匀随机子采样的效率。