For data segmentation in high-dimensional linear regression settings, the regression parameters are often assumed to be sparse segment-wise, which enables many existing methods to estimate the parameters locally via $\ell_1$-regularised maximum likelihood-type estimation and then contrast them for change point detection. Contrary to this common practice, we show that the sparsity of neither regression parameters nor their differences, a.k.a. differential parameters, is necessary for consistency in multiple change point detection. In fact, both statistically and computationally, better efficiency is attained by a simple strategy that scans for large discrepancies in local covariance between the regressors and the response. We go a step further and propose a suite of tools for directly inferring about the differential parameters post-segmentation, which are applicable even when the regression parameters themselves are non-sparse. Theoretical investigations are conducted under general conditions permitting non-Gaussianity, temporal dependence and ultra-high dimensionality. Numerical results from simulated and macroeconomic datasets demonstrate the competitiveness and efficacy of the proposed methods.
翻译:在高维线性回归的数据分割问题中,通常假设回归参数分段稀疏,这使得现有方法能够通过基于$\ell_1$正则化的极大似然型估计局部估计参数,进而对比这些参数以检测变化点。与这一常见做法不同,我们证明,无论是回归参数本身还是其差异(即差分参数)的稀疏性,对于多重变化点检测的一致性并非必要条件。事实上,从统计和计算的角度来看,一种简单的策略——即扫描回归变量与响应之间局部协方差的显著差异——能获得更高的效率。我们进一步提出一套工具,用于在分割后直接推断差分参数,即使回归参数本身非稀疏时这些工具仍然适用。理论分析在允许非高斯性、时间依赖性和超高维度的通用条件下展开。模拟数据集和宏观经济数据集的数值结果证明了所提方法的竞争力和有效性。