Mixtures of regression are a powerful class of models for regression learning with respect to a highly uncertain and heterogeneous response variable of interest. In addition to being a rich predictive model for the response given some covariates, the parameters in this model class provide useful information about the heterogeneity in the data population, which is represented by the conditional distributions for the response given the covariates associated with a number of distinct but latent subpopulations. In this paper, we investigate conditions of strong identifiability, rates of convergence for conditional density and parameter estimation, and the Bayesian posterior contraction behavior arising in finite mixture of regression models, under exact-fitted and over-fitted settings and when the number of components is unknown. This theory is applicable to common choices of link functions and families of conditional distributions employed by practitioners. We provide simulation studies and data illustrations, which shed some light on the parameter learning behavior found in several popular regression mixture models reported in the literature.
翻译:混合回归模型是一类强大的回归学习模型,适用于高度不确定且异质的响应变量。除了作为给定协变量下响应变量的丰富预测模型外,该类模型中的参数还能提供关于数据总体异质性的有用信息,这种异质性由给定协变量条件下响应变量的条件分布表示,而这些分布与多个不同但潜在的子总体相关联。本文研究了有限混合回归模型在精确拟合和过拟合设置下,以及分量数量未知时,强可辨识性条件、条件密度和参数估计的收敛速度,以及贝叶斯后验收缩行为。该理论适用于从业者常用的链接函数和条件分布族。我们提供了模拟研究和数据示例,揭示了文献中报道的几种流行回归混合模型中的参数学习行为。