We consider the problem of parameter estimation from a generalized linear model with a random design matrix that is orthogonally invariant in law. Such a model allows the design have an arbitrary distribution of singular values and only assumes that its singular vectors are generic. It is a vast generalization of the i.i.d. Gaussian design typically considered in the theoretical literature, and is motivated by the fact that real data often have a complex correlation structure so that methods relying on i.i.d. assumptions can be highly suboptimal. Building on the paradigm of spectrally-initialized iterative optimization, this paper proposes optimal spectral estimators and combines them with an approximate message passing (AMP) algorithm, establishing rigorous performance guarantees for these two algorithmic steps. Both the spectral initialization and the subsequent AMP meet existing conjectures on the fundamental limits to estimation -- the former on the optimal sample complexity for efficient weak recovery, and the latter on the optimal errors. Numerical experiments suggest the effectiveness of our methods and accuracy of our theory beyond orthogonally invariant data.
翻译:我们考虑从具有正交不变律随机设计矩阵的广义线性模型中进行参数估计的问题。此类模型允许设计矩阵具有任意奇异值分布,仅假设其奇异向量具有一般性。这极大地推广了理论文献中通常考虑的独立同分布高斯设计,其动机在于真实数据往往具有复杂的相关结构,因此依赖独立同分布假设的方法可能严重次优。基于谱初始化迭代优化范式,本文提出了最优谱估计器,并将其与近似消息传递(AMP)算法相结合,为这两个算法步骤建立了严格的性能保证。谱初始化与后续AMP均达到了现有关于估计基本极限的猜想——前者关于高效弱恢复的最优样本复杂度,后者关于最优误差。数值实验表明我们的方法在正交不变数据之外仍具有有效性,且理论预测保持准确性。