We introduce a new methodology 'charcoal' for estimating the location of sparse changes in high-dimensional linear regression coefficients, without assuming that those coefficients are individually sparse. The procedure works by constructing different sketches (projections) of the design matrix at each time point, where consecutive projection matrices differ in sign in exactly one column. The sequence of sketched design matrices is then compared against a single sketched response vector to form a sequence of test statistics whose behaviour shows a surprising link to the well-known CUSUM statistics of univariate changepoint analysis. The procedure is computationally attractive, and strong theoretical guarantees are derived for its estimation accuracy. Simulations confirm that our methods perform well in extensive settings, and a real-world application to a large single-cell RNA sequencing dataset showcases the practical relevance.
翻译:我们提出了一种名为“charcoal”的新方法,用于估计高维线性回归系数中稀疏变化的位置,且无需假设这些系数本身是稀疏的。该过程通过在每个时间点构建设计矩阵的不同草图(投影)来运行,其中连续的投影矩阵仅在一列上符号不同。随后,将一系列草图设计矩阵与单个草图响应向量进行比较,生成一组检验统计量,其表现与单变量变点分析中著名的CUSUM统计量显示出惊人的关联。该方法计算效率高,并为其估计精度提供了强有力的理论保证。模拟实验证实,我们的方法在多种场景下表现良好,而对大规模单细胞RNA测序数据集的真实应用则展示了其实用价值。