Longitudinal or panel data can be represented as a matrix with rows indexed by units and columns indexed by time. We consider inferential questions associated with the missing data version of panel data induced by staggered adoption. We propose a computationally efficient procedure for estimation, involving only simple matrix algebra and singular value decomposition, and prove non-asymptotic and high-probability bounds on its error in estimating each missing entry. By controlling proximity to a suitably scaled Gaussian variable, we develop and analyze a data-driven procedure for constructing entrywise confidence intervals with pre-specified coverage. Despite its simplicity, our procedure turns out to be instance-optimal: we prove that the width of our confidence intervals match a non-asymptotic instance-wise lower bound derived via a Bayesian Cram\'{e}r-Rao argument. We illustrate the sharpness of our theoretical characterization on a variety of numerical examples. Our analysis is based on a general inferential toolbox for SVD-based algorithm applied to the matrix denoising model, which might be of independent interest.
翻译:纵向或面板数据可以表示为以个体为行索引、时间为列索引的矩阵。我们研究了由交错采用所引发的缺失面板数据版本相关的推断问题。我们提出了一种计算高效的估计方法,仅涉及简单的矩阵运算和奇异值分解,并证明了其在估计每个缺失项时误差的非渐近高概率界。通过控制与适当缩放的高斯变量的接近度,我们开发并分析了一种数据驱动的程序,用于构建具有预设覆盖率的逐项置信区间。尽管方法简单,但我们的程序被证明是实例最优的:我们证明了置信区间的宽度匹配了通过贝叶斯克拉默-拉奥论证推导出的非渐近实例下界。我们在多种数值算例中说明了理论特征的锐度。我们的分析基于一个通用的推断工具箱,适用于应用于矩阵去噪模型的基于SVD的算法,这可能具有独立的研究价值。