We study the high-dimensional partial linear model, where the linear part has a high-dimensional sparse regression coefficient and the nonparametric part includes a function whose derivatives are of bounded total variation. We expand upon the univariate trend filtering to develop partial linear trend filtering--a doubly penalized least square estimation approach based on $\ell_1$ penalty and total variation penalty. Analogous to the advantages of trend filtering in univariate nonparametric regression, partial linear trend filtering not only can be efficiently computed, but also achieves the optimal error rate for estimating the nonparametric function. This in turn leads to the oracle rate for the linear part as if the underlying nonparametric function were known. We compare the proposed approach with a standard smoothing spline based method, and show both empirically and theoretically that the former outperforms the latter when the underlying function possesses heterogeneous smoothness. We apply our approach to the IDATA study to investigate the relationship between metabolomic profiles and ultra-processed food (UPF) intake, efficiently identifying key metabolites associated with UPF consumption and demonstrating strong predictive performance.
翻译:我们研究高维部分线性模型,其中线性部分具有高维稀疏回归系数,非参数部分包含导数具有有界全变差的函数。我们基于单变量趋势滤波进行扩展,提出了部分线性趋势滤波——一种基于$\ell_1$惩罚和全变差惩罚的双重惩罚最小二乘估计方法。与趋势滤波在单变量非参数回归中的优势类似,部分线性趋势滤波不仅能够高效计算,而且在估计非参数函数时达到了最优误差率。这进而使得线性部分能够获得若底层非参数函数已知时的oracle速率。我们将所提方法与基于标准平滑样条的方法进行比较,并从实证和理论两方面证明,当底层函数具有异质性平滑度时,前者优于后者。我们将该方法应用于IDATA研究,以探究代谢组学特征与超加工食品摄入量之间的关系,高效识别出与UPF摄入相关的关键代谢物,并展现出强大的预测性能。