Supervised dimension reduction (SDR) has been a topic of growing interest in data science, as it enables the reduction of high-dimensional covariates while preserving the functional relation with certain response variables of interest. However, existing SDR methods are not suitable for analyzing datasets collected from case-control studies. In this setting, the goal is to learn and exploit the low-dimensional structure unique to or enriched by the case group, also known as the foreground group. While some unsupervised techniques such as the contrastive latent variable model and its variants have been developed for this purpose, they fail to preserve the functional relationship between the dimension-reduced covariates and the response variable. In this paper, we propose a supervised dimension reduction method called contrastive inverse regression (CIR) specifically designed for the contrastive setting. CIR introduces an optimization problem defined on the Stiefel manifold with a non-standard loss function. We prove the convergence of CIR to a local optimum using a gradient descent-based algorithm, and our numerical study empirically demonstrates the improved performance over competing methods for high-dimensional data.
翻译:监督降维(SDR)是数据科学中日益受到关注的研究方向,它能在保留与特定感兴趣响应变量函数关系的同时,实现高维协变量的降维。然而,现有SDR方法并不适用于分析病例对照研究收集的数据集。此类研究的目标是学习并利用病例组(也称前景组)所特有或富集的低维结构。尽管对比潜变量模型及其变体等无监督技术为此被提出,但它们未能保留降维后协变量与响应变量之间的函数关系。本文提出一种专门针对对比场景设计的监督降维方法——对比逆回归(CIR)。CIR在Stiefel流形上定义了一个带有非标准损失函数的优化问题。我们证明基于梯度下降的算法能使CIR收敛至局部最优解,数值研究也通过高维数据实验实证展示了该方法相较于现有方法的性能提升。