We present biniLasso and its sparse variant (sparse biniLasso), novel methods for prognostic analysis of high-dimensional survival data that enable detection of multiple cut-points per feature. Our approach leverages the Cox proportional hazards model with two key innovations: (1) a cumulative binarization scheme with $L_1$-penalized coefficients operating on context-dependent cut-point candidates, and (2) for sparse biniLasso, additional uniLasso regularization to enforce sparsity while preserving univariate coefficient patterns. These innovations yield substantially improved interpretability, computational efficiency (4-11x faster than existing approaches), and prediction performance. Through extensive simulations, we demonstrate superior performance in cut-point detection, particularly in high-dimensional settings. Application to three genomic cancer datasets from TCGA confirms the methods' practical utility, with both variants showing enhanced risk prediction accuracy compared to conventional techniques.
翻译:本文提出biniLasso及其稀疏变体(稀疏biniLasso),这是用于高维生存数据预后分析的新方法,能够检测每个特征的多个切点。我们的方法基于Cox比例风险模型,并引入两项关键创新:(1)采用$L_1$惩罚系数的累积二值化方案,该方案作用于上下文相关的候选切点;(2)对于稀疏biniLasso,额外引入uniLasso正则化以在保持单变量系数模式的同时增强稀疏性。这些创新显著提升了模型的可解释性、计算效率(比现有方法快4-11倍)和预测性能。通过大量模拟实验,我们证明了该方法在切点检测方面,特别是在高维场景下的优越性能。在TCGA的三个基因组癌症数据集上的应用验证了该方法的实用价值,两种变体相较于传统技术均展现出更高的风险预测精度。