Achieving accurate approximations to solutions of large linear systems is crucial, especially when those systems utilize real-world data. A consequence of using real-world data is that there will inevitably be missingness. Current approaches for dealing with missing data, such as deletion and imputation, can introduce bias. Recent studies proposed an adaptation of stochastic gradient descent (SGD) in specific missing-data models. In this work, we propose a new algorithm, $\ell$-tuple mSGD, for the setting in which data is missing in a block-wise, tuple pattern. We prove that our proposed method uses unbiased estimates of the gradient of the least squares objective in the presence of tuple missing data. We also draw connections between $\ell$-tuple mSGD and previously established SGD-type methods for missing data. Furthermore, we prove our algorithm converges when using updating step sizes and empirically demonstrate the convergence of $\ell$-tuple mSGD on synthetic data. Lastly, we evaluate $\ell$-tuple mSGD applied to real-world continuous glucose monitoring (CGM) device data.
翻译:对大规模线性系统解的精确近似至关重要,尤其当这些系统采用真实世界数据时。使用真实数据必然导致数据缺失问题。当前的缺失数据处理方法,如删除法和插补法,可能会引入偏差。近期研究针对特定缺失数据模型提出了随机梯度下降(SGD)的改进方法。本研究提出一种新算法——$\ell$-元组mSGD,适用于数据以分块元组模式缺失的场景。我们证明,在元组缺失数据情况下,该方法使用最小二乘目标梯度无偏估计量。同时建立$\ell$-元组mSGD与既有缺失数据SGD型方法之间的理论联系。进一步证明算法采用更新步长时的收敛性,并通过合成数据实验验证$\ell$-元组mSGD的收敛效果。最后,我们评估$\ell$-元组mSGD在真实连续血糖监测(CGM)设备数据上的应用表现。