Supervised matrix factorization (SMF) is a classical machine learning method that simultaneously seeks feature extraction and classification tasks, which are not necessarily a priori aligned objectives. Our goal is to use SMF to learn low-rank latent factors that offer interpretable, data-reconstructive, and class-discriminative features, addressing challenges posed by high-dimensional data. Training SMF model involves solving a nonconvex and possibly constrained optimization with at least three blocks of parameters. Known algorithms are either heuristic or provide weak convergence guarantees for special cases. In this paper, we provide a novel framework that 'lifts' SMF as a low-rank matrix estimation problem in a combined factor space and propose an efficient algorithm that provably converges exponentially fast to a global minimizer of the objective with arbitrary initialization under mild assumptions. Our framework applies to a wide range of SMF-type problems for multi-class classification with auxiliary features. To showcase an application, we demonstrate that our algorithm successfully identified well-known cancer-associated gene groups for various cancers.
翻译:有监督矩阵分解(SMF)是一种经典的机器学习方法,可同时实现特征提取与分类任务,但这两个目标并非天然一致。我们的目标是利用SMF学习低秩潜在因子,从而获得兼具可解释性、数据重构性和类别判别性的特征,以应对高维数据带来的挑战。训练SMF模型需要求解一个非凸且可能带约束的优化问题,其中至少涉及三个参数块。现有算法要么是启发式的,要么仅在特殊情况下提供较弱的收敛性保证。本文提出了一种新颖的框架,将SMF“提升”为组合因子空间中的低秩矩阵估计问题,并设计了一种高效算法,在温和假设下,该算法能从任意初始化出发,以指数级速度收敛到目标的全局极小点。我们的框架适用于多种具有辅助特征的多分类SMF类问题。为展示其应用,我们通过实验证明,该算法成功识别了多种癌症中已知的癌症相关基因组。