To overcome challenges in fitting complex models with small samples, catalytic priors have recently been proposed to stabilize the inference by supplementing observed data with synthetic data generated from simpler models. Based on a catalytic prior, the Maximum A Posteriori (MAP) estimator is a regularized estimator that maximizes the weighted likelihood of the combined data. This estimator is straightforward to compute, and its numerical performance is superior or comparable to other likelihood-based estimators. In this paper, we study several theoretical aspects regarding the MAP estimator in generalized linear models, with a particular focus on logistic regression. We first prove that under mild conditions, the MAP estimator exists and is stable against the randomness in synthetic data. We then establish the consistency of the MAP estimator when the dimension of covariates diverges slower than the sample size. Furthermore, we utilize the convex Gaussian min-max theorem to characterize the asymptotic behavior of the MAP estimator as the dimension grows linearly with the sample size. These theoretical results clarify the role of the tuning parameters in a catalytic prior, and provide insights in practical applications. We provide numerical studies to confirm the effective approximation of our asymptotic theory in finite samples and to illustrate adjusting inference based on the theory.
翻译:为克服小样本下拟合复杂模型的挑战,近期提出的催化先验通过从更简单模型生成的合成数据补充观测数据,从而稳定推断。基于催化先验的最大后验(MAP)估计量是一种正则化估计量,其最大化组合数据的加权似然。该估计量计算简便,其数值性能优于或至少不逊于其他基于似然的估计量。本文研究广义线性模型中MAP估计量的若干理论性质,特别聚焦于逻辑回归。我们首先证明在温和条件下,MAP估计量存在且对合成数据的随机性具有稳定性。随后建立协变量维数发散速度慢于样本量时MAP估计量的一致性。进一步,利用凸高斯极小极大定理刻画维数与样本量呈线性增长时MAP估计量的渐近行为。这些理论结果阐明了催化先验中调节参数的作用,并为实际应用提供洞见。我们通过数值研究验证渐近理论在有限样本中的有效逼近性,并展示基于该理论调整推断的方法。