The matched case-control design, up until recently mostly pertinent to epidemiological studies, is becoming customary in biomedical applications as well. For instance, in omics studies, it is quite common to compare cancer and healthy tissue from the same patient. Furthermore, researchers today routinely collect data from various and variable sources that they wish to relate to the case-control status. This highlights the need to develop and implement statistical methods that can take these tendencies into account. We present an R package penalizedclr, that provides an implementation of the penalized conditional logistic regression model for analyzing matched case-control studies. It allows for different penalties for different blocks of covariates, and it is therefore particularly useful in the presence of multi-source omics data. Both L1 and L2 penalties are implemented. Additionally, the package implements stability selection for variable selection in the considered regression model. The proposed method fills a gap in the available software for fitting high-dimensional conditional logistic regression model accounting for the matched design and block structure of predictors/features. The output consists of a set of selected variables that are significantly associated with case-control status. These features can then be investigated in terms of functional interpretation or validation in further, more targeted studies.
翻译:匹配病例对照设计直到最近主要适用于流行病学研究,现在也逐渐成为生物医学应用中的常规方法。例如,在组学研究中,比较同一患者的癌组织和健康组织非常普遍。此外,当今研究人员通常会定期收集来自不同且可变来源的数据,希望将其与病例-对照状态关联起来。这凸显了开发能够考虑这些趋势的统计方法并加以实现的必要性。我们推出R包penalizedclr,该包实现了用于分析匹配病例对照研究的惩罚条件逻辑回归模型。它允许对不同协变量区块施加不同惩罚,因此在存在多源组学数据时特别有用。该包同时实现了L1和L2惩罚。此外,该包在所考虑的回归模型中实现了用于变量选择的稳定性选择方法。所提出的方法填补了现有软件在高维条件逻辑回归模型拟合方面的空白,这些模型需考虑匹配设计及预测因子/特征的区块结构。输出结果包含一组与病例-对照状态显著关联的选定变量,这些特征可在后续更针对性的研究中通过功能解释或验证进行进一步探究。