Sketching algorithms use random projections to generate a smaller sketched data set, often for the purposes of modelling. Complete and partial sketch regression estimates can be constructed using information from only the sketched data set or a combination of the full and sketched data sets. Previous work has obtained the distribution of these estimators under repeated sketching, along with the first two moments for both estimators. Using a different approach, we also derive the distribution of the complete sketch estimator, but additionally consider the error term under both repeated sketching and sampling. Importantly, we obtain pivotal quantities which are based solely on the sketched data set which specifically not requiring information from the full data model fit. These pivotal quantities can be used for inference on the full data set regression estimates or the model parameters. For partial sketching, we derive pivotal quantities for a marginal test and an approximate distribution for the partial sketch under repeated sketching or repeated sampling, again avoiding reliance on a full data model fit. We extend these results to include the Hadamard and Clarkson-Woodruff sketches then compare them in a simulation study.
翻译:草图算法利用随机投影生成规模较小的草图数据集,常用于建模目的。完全草图回归估计与部分草图回归估计可分别仅基于草图数据集或结合完整数据集与草图数据集构建。已有研究获取了重复草图过程中这些估计量的分布,以及两类估计量的前两阶矩。通过采用不同方法,我们同样推导了完全草图估计量的分布,并进一步考虑了重复草图与重复采样两种情形下的误差项。关键之处在于,我们得到了仅基于草图数据集的枢轴量,这类枢轴量无需借助完整数据模型拟合信息。这些枢轴量可用于对完整数据集的回归估计或模型参数进行统计推断。针对部分草图方法,我们推导了边际检验的枢轴量,并给出了重复草图或重复采样条件下部分草图估计量的近似分布,同样避免了对完整数据模型拟合的依赖。我们将这些结果推广至哈达玛草图与克拉森-伍德拉夫草图,并通过仿真研究进行了对比分析。