Modern data sets, such as those in healthcare and e-commerce, are often derived from many individuals or systems but have insufficient data from each source alone to separately estimate individual, often high-dimensional, model parameters. If there is shared structure among systems however, it may be possible to leverage data from other systems to help estimate individual parameters, which could otherwise be non-identifiable. In this paper, we assume systems share a latent low-dimensional parameter space and propose a method for recovering $d$-dimensional parameters for $N$ different linear systems, even when there are only $T<d$ observations per system. To do so, we develop a three-step algorithm which estimates the low-dimensional subspace spanned by the systems' parameters and produces refined parameter estimates within the subspace. We provide finite sample subspace estimation error guarantees for our proposed method. Finally, we experimentally validate our method on simulations with i.i.d. regression data and as well as correlated time series data.
翻译:现代数据集(如医疗和电子商务领域的数据)通常来源于多个个体或系统,但每个来源的独立数据量不足,难以单独估计个体模型参数(这些参数往往维度较高)。然而,若系统间存在共享结构,则可借助其他系统的数据辅助估计原本可能不可识别的个体参数。本文假设系统共享一个潜在的低维参数空间,并提出一种方法,即使在每个系统仅有T<d个观测值的情况下,也能恢复N个不同线性系统的d维参数。为此,我们设计了一个三步算法:首先估计系统参数张成的低维子空间,然后在该子空间内生成修正后的参数估计值。我们为所提方法提供了有限样本子空间估计误差的理论保证。最后,通过独立同分布回归数据仿真实验以及相关时间序列数据实验,验证了该方法的有效性。