We study the problem of identifying change points in high-dimensional generalized linear models, and propose an approach based on sample-weighted empirical risk minimization. Our method, Weighted ERM, encodes priors on the change points via weights assigned to each sample, to obtain weighted versions of standard estimators such as M-estimators and maximum-likelihood estimators. Under mild assumptions on the data, we obtain a precise asymptotic characterization of the performance of our method for general Gaussian designs, in the high-dimensional limit where the number of samples and covariate dimension grow proportionally. We show how this characterization can be used to efficiently construct a posterior distribution over change points. Numerical experiments on both simulated and real data illustrate the efficacy of Weighted ERM compared to existing approaches, demonstrating that sample weights constructed with weakly informative priors can yield accurate change point estimators. Our method is implemented as an open-source package, weightederm, available in Python and R.
翻译:我们研究了高维广义线性模型中变点识别问题,并提出一种基于样本加权经验风险最小化的方法。所提出的加权经验风险最小化方法通过为每个样本分配权重来编码变点的先验信息,从而获得标准估计量(如M估计量与极大似然估计量)的加权版本。在数据的温和假设条件下,我们在样本量与协变量维度成比例增长的高维极限下,精确刻画了该方法对一般高斯设计矩阵的渐近性能。我们展示了如何利用该刻画高效构建变点的后验分布。在模拟数据和真实数据上的数值实验表明,与现有方法相比,加权经验风险最小化具有优越性能,证明采用弱信息先验构建的样本权重能够产生准确的变点估计量。该方法已实现为开源软件包weightederm,提供Python和R语言版本。