Regularized linear regression is central to machine learning, yet its high-dimensional behavior with informative priors remains poorly understood. We provide the first exact asymptotic characterization of training and test risks for maximum a posteriori (MAP) regression with Gaussian priors centered at a domain-informed initialization. Our framework unifies ridge regression, least squares, and prior-informed estimators, and -- using random matrix theory -- yields closed-form risk formulas that expose the bias-variance-prior tradeoff, explain double descent, and quantify prior mismatch. We also identify a closed-form minimizer of test risk, enabling a simple estimator of the optimal regularization parameter. Simulations confirm the theory with high accuracy. By connecting Bayesian priors, classical regularization, and modern asymptotics, our results provide both conceptual clarity and practical guidance for learning with structured prior knowledge.
翻译:正则化线性回归是机器学习的核心问题,然而其在高维情况下结合信息性先验的行为仍未被充分理解。我们首次对采用以领域知识初始化为中心的高斯先验的最大后验(MAP)回归,给出了训练与测试风险的精确渐近刻画。该框架统一了岭回归、最小二乘以及先验信息估计器,并借助随机矩阵理论导出了闭式风险公式,揭示了偏差-方差-先验权衡,解释了双下降现象,并量化了先验失配的影响。我们还给出了测试风险的闭式最小化解,从而能够简单估计最优正则化参数。仿真实验以高精度验证了理论结果。通过连接贝叶斯先验、经典正则化与现代渐近理论,我们的研究结果为利用结构化先验知识进行学习提供了概念上的清晰解释与实践指导。