Mixed Regression via Approximate Message Passing

from arxiv, 44 pages. To appear in the Journal of Machine Learning Research. A shorter version of this paper appeared in the proceedings of AISTATS 2023

We study the problem of regression in a generalized linear model (GLM) with multiple signals and latent variables. This model, which we call a matrix GLM, covers many widely studied problems in statistical learning, including mixed linear regression, max-affine regression, and mixture-of-experts. In mixed linear regression, each observation comes from one of $L$ signal vectors (regressors), but we do not know which one; in max-affine regression, each observation comes from the maximum of $L$ affine functions, each defined via a different signal vector. The goal in all these problems is to estimate the signals, and possibly some of the latent variables, from the observations. We propose a novel approximate message passing (AMP) algorithm for estimation in a matrix GLM and rigorously characterize its performance in the high-dimensional limit. This characterization is in terms of a state evolution recursion, which allows us to precisely compute performance measures such as the asymptotic mean-squared error. The state evolution characterization can be used to tailor the AMP algorithm to take advantage of any structural information known about the signals. Using state evolution, we derive an optimal choice of AMP `denoising' functions that minimizes the estimation error in each iteration. The theoretical results are validated by numerical simulations for mixed linear regression, max-affine regression, and mixture-of-experts. For max-affine regression, we propose an algorithm that combines AMP with expectation-maximization to estimate intercepts of the model along with the signals. The numerical results show that AMP significantly outperforms other estimators for mixed linear regression and max-affine regression in most parameter regimes.

翻译：我们研究广义线性模型（GLM）中具有多信号和潜变量的回归问题。这一我们称之为矩阵GLM的模型涵盖了统计学习中许多广泛研究的问题，包括混合线性回归、最大仿射回归和专家混合模型。在混合线性回归中，每个观测值来自L个信号向量（回归变量）中的一个，但我们不知道具体是哪一个；在最大仿射回归中，每个观测值来自L个仿射函数的最大值，每个函数由不同的信号向量定义。所有这些问题的目标是从观测值中估计信号以及可能的潜变量。我们提出了一种新颖的近似消息传递（AMP）算法用于矩阵GLM中的估计，并严格刻画了其在高维极限下的性能。该刻画基于状态演化递归，使我们能够精确计算诸如渐近均方误差等性能度量。状态演化特性可用于定制AMP算法以利用已知信号的任何结构信息。通过状态演化，我们推导出每次迭代中最小化估计误差的最优AMP“去噪”函数选择。理论结果通过混合线性回归、最大仿射回归和专家混合模型的数值模拟得到验证。针对最大仿射回归，我们提出了一种将AMP与期望最大化相结合来估计模型截距与信号的算法。数值结果表明，在大多数参数范围内，AMP在混合线性回归和最大仿射回归中显著优于其他估计器。