Motivated by the prevalence of environments in which data is abundant while resources for storage and/or transmission might be scarce, we study linear regression when predictors, their squares, and responses are subject to single-bit dithered quantization. An estimator relying on plug-in estimation of the quadratic and linear terms in the quadratic program formulation of the least squares problem is proposed. We provide a non-asymptotic bound on the $\ell_2$-estimation error of this estimator and obtain its asymptotic distribution when the number of predictors is fixed, which can be used for inference and an investigation of the mean-square error efficiency relative to the ordinary least squares estimator. It is shown that for the quantization protocol under consideration, substantial improvements over the proposed estimator cannot be expected. A compression pipeline in which the underlying data is first subject to sketching and subsequently quantization can be studied within our framework as well. We also present an extension to address high-dimensional predictors. Numerical experiments with synthetic data complement our theoretical findings.
翻译:受数据丰富但存储和/或传输资源可能稀缺的环境普遍存在的启发,本研究探讨了当预测变量、其平方项以及响应变量均受到单比特抖动量化影响时的线性回归问题。我们提出了一种估计器,该估计器基于对最小二乘问题二次规划形式中二次项和线性项的插入式估计。我们给出了该估计器$\ell_2$估计误差的非渐近界,并在预测变量数量固定时推导出其渐近分布,该分布可用于推断以及相对于普通最小二乘估计器的均方误差效率研究。研究表明,对于所考虑的量化协议,无法期望所提估计器能有显著改进。在我们的框架内,还可以研究先对基础数据进行素描处理再进行量化的压缩管线。我们还提出了针对高维预测变量的扩展。合成数据的数值实验验证了我们的理论结果。