High-dimensional regression and regression with a left-censored response are each well-studied topics. In spite of this, few methods have been proposed which deal with both of these complications simultaneously. The Tobit model -- long the standard method for censored regression in economics -- has not been adapted for high-dimensional regression at all. To fill this gap and bring up-to-date techniques from high-dimensional statistics to the field of high-dimensional left-censored regression, we propose several penalized Tobit models. We develop a fast algorithm which combines quadratic minimization with coordinate descent to compute the penalized Tobit solution path. Theoretically, we analyze the Tobit lasso and Tobit with a folded concave penalty, bounding the $\ell_2$ estimation loss for the former and proving that a local linear approximation estimator for the latter possesses the strong oracle property. Through an extensive simulation study, we find that our penalized Tobit models provide more accurate predictions and parameter estimates than other methods. We use a penalized Tobit model to analyze high-dimensional left-censored HIV viral load data from the AIDS Clinical Trials Group and identify potential drug resistance mutations in the HIV genome. Appendices contain intermediate theoretical results and technical proofs.
翻译:高维回归与左截尾响应回归均是研究成熟的课题。尽管如此,能同时处理这两种复杂情况的回归方法却鲜有提出。Tobit模型——长期以来经济学中截尾回归的标准方法——尚未被应用于高维回归场景。为填补这一空白并将高维统计的最新技术引入高维左截尾回归领域,我们提出了若干惩罚Tobit模型。我们开发了一种结合二次优化与坐标下降的快速算法,用于计算惩罚Tobit解路径。在理论层面,我们分析了Tobit lasso与带有折叠凹惩罚的Tobit模型,界定了前者的$\ell_2$估计损失,并证明了后者的局部线性近似估计量具有强Oracle性质。通过广泛模拟研究,我们发现所提出的惩罚Tobit模型在预测精度与参数估计上优于其他方法。我们利用惩罚Tobit模型分析了来自艾滋病临床试验组的高维左截尾HIV病毒载量数据,并识别出HIV基因组中潜在的耐药突变位点。附录包含中间理论结果与技术证明。