For the data segmentation problem in high-dimensional linear regression settings, a commonly made assumption is that the regression parameters are segment-wise sparse, which enables many existing methods to estimate the parameters locally via $\ell_1$-regularised maximum likelihood-type estimation and contrast them for change point detection. Contrary to the common belief, we show that the sparsity of neither regression parameters nor their differences, a.k.a.\ differential parameters, is necessary for achieving the consistency in multiple change point detection. In fact, both statistically and computationally, better efficiency is attained by a simple strategy that scans for large discrepancies in local covariance between the regressors and the response. We go a step further and propose a suite of tools for directly inferring about the differential parameters post-segmentation, which are applicable even when the regression parameters themselves are non-sparse. Theoretical investigations are conducted under general conditions permitting non-Gaussianity, temporal dependence and ultra-high dimensionality. Numerical experiments demonstrate the competitiveness of the proposed methodologies.
翻译:针对高维线性回归中的数据分割问题,通常假设回归参数具有分段稀疏性,这使得现有方法能够通过$\ell_1$正则化极大似然型估计局部参数,并对其进行比较以检测变点。与普遍认知相反,我们表明回归参数或其差异(即差分参数)的稀疏性并非实现多重变点检测一致性的必要条件。事实上,无论在统计还是计算层面,通过扫描回归变量与响应之间局部协方差的大幅差异这一简单策略均可获得更高效率。我们进一步提出一套工具,用于直接推断分割后的差分参数,即使回归参数本身非稀疏时仍适用。理论分析在允许非高斯性、时间依赖性和超高维度的一般条件下进行。数值实验证明了所提方法的竞争力。