Nonparametric varying coefficient (NVC) models are useful for modeling time-varying effects on responses that are measured repeatedly for the same subjects. When the number of covariates is moderate or large, it is desirable to perform variable selection from the varying coefficient functions. However, existing methods for variable selection in NVC models either fail to account for within-subject correlations or require the practitioner to specify a parametric form for the correlation structure. In this paper, we introduce the nonparametric varying coefficient spike-and-slab lasso (NVC-SSL) for Bayesian high-dimensional NVC models. Through the introduction of functional random effects, our method allows for flexible modeling of within-subject correlations without needing to specify a parametric covariance function. We further propose several scalable optimization and Markov chain Monte Carlo (MCMC) algorithms. For variable selection, we propose an Expectation Conditional Maximization (ECM) algorithm to rapidly obtain maximum a posteriori (MAP) estimates. Our ECM algorithm scales linearly in the total number of observations $N$ and the number of covariates $p$. For uncertainty quantification, we introduce an approximate MCMC algorithm that also scales linearly in both $N$ and $p$. We demonstrate the scalability, variable selection performance, and inferential capabilities of our method through simulations and a real data application. These algorithms are implemented in the publicly available R package NVCSSL on the Comprehensive R Archive Network.
翻译:非参数变系数(NVC)模型适用于对同一受试者重复测量的响应变量随时间变化效应的建模。当协变量数量适中或较多时,需要对变系数函数进行变量选择。然而,现有的NVC模型变量选择方法要么未能考虑个体内相关性,要么要求研究者指定相关结构的参数形式。本文提出非参数变系数尖峰-板条套索(NVC-SSL)方法,用于贝叶斯高维NVC模型。通过引入函数型随机效应,我们的方法能够灵活建模个体内相关性,无需指定参数化协方差函数。进一步,我们提出了多种可扩展的优化与马尔可夫链蒙特卡洛(MCMC)算法。在变量选择方面,我们构建了期望条件最大化(ECM)算法以快速获取最大后验(MAP)估计,该算法复杂度与总观测数$N$和协变量数$p$呈线性关系。在不确定性量化方面,我们引入近似MCMC算法,其复杂度同样与$N$和$p$呈线性关系。通过模拟实验和实际数据应用,验证了该方法在可扩展性、变量选择性能及推断能力方面的优势。这些算法已在综合R档案网络(CRAN)上公开的R包NVCSSL中实现。