We introduce Binacox+, an advanced extension of the Binacox method for prognostic analysis of high-dimensional survival data, enabling the detection of multiple cut-points per feature. The original Binacox method leverages the Cox proportional hazards model, combining one-hot encoding with the binarsity penalty to simultaneously perform feature selection and cut-point detection. In this work, we enhance Binacox by incorporating a novel penalty term based on the L1 norm of coefficients for cumulative binarization, defined over a set of pre-specified, context-dependent cut-point candidates. This new penalty not only improves interpretability but also significantly reduces computational time and enhances prediction performance compared to the original method. We conducted extensive simulation studies to evaluate the statistical and computational properties of Binacox+ in comparison to Binacox. Our simulation results demonstrate that Binacox+ achieves superior performance in important cut-point detection, particularly in high-dimensional settings, while drastically reducing computation time. As a case study, we applied both methods to three real-world genomic cancer datasets from The Cancer Genome Atlas (TCGA). The empirical results confirm that Binacox+ outperforms Binacox+ in risk prediction accuracy and computational efficiency, making it a powerful tool for survival analysis in high-dimensional biomedical data.
翻译:本文介绍了Binacox+方法,这是针对高维生存数据预后分析的Binacox方法的进阶扩展,能够实现每个特征多个切点的检测。原始Binacox方法基于Cox比例风险模型,通过独热编码与分箱稀疏性惩罚的结合,同步完成特征选择与切点检测。本研究通过引入基于累积二值化系数L1范数的新型惩罚项对Binacox进行改进,该惩罚项定义在一组预先设定的、与上下文相关的候选切点集合上。相较于原始方法,新惩罚项不仅提升了模型可解释性,还显著减少了计算时间并改善了预测性能。我们通过大量模拟研究评估了Binacox+相对于Binacox的统计与计算特性。模拟结果表明,Binacox+在高维场景下实现了重要切点检测性能的显著提升,同时大幅降低了计算耗时。作为案例研究,我们将两种方法应用于癌症基因组图谱(TCGA)的三个真实世界基因组癌症数据集。实证结果证实,Binacox+在风险预测精度与计算效率方面均优于原始方法,使其成为高维生物医学数据生存分析的强大工具。