Healthcare cost prediction is a challenging task due to the high-dimensionality and high correlation among covariates. Additionally, the skewed, heavy-tailed, and often multi-modal nature of cost data can complicate matters further due to unobserved heterogeneity. In this study, we propose a novel framework for finite mixture regression models that incorporates covariate clustering methods to better account for the effects of clustered covariates on subgroups of the outcome, which enables a more accurate characterization of the complex distribution of the data. The proposed framework can be formulated as a convex optimization problem with an additional penalty term based on the prior similarity of the covariates. To efficiently solve this optimization problem, a specialized EM-ADMM algorithm is proposed that integrates the alternating direction multiplicative method (ADMM) into the iterative process of the expectation-maximizing (EM) algorithm. The convergence of the algorithm and the efficiency of the covariate clustering method are verified using simulation data, and the superiority of the approach over traditional regression techniques is demonstrated using two real Chinese medical expenditure datasets. Our empirical results provide valuable insights into the complex network graph of the covariates and can inform business practices, such as the design and pricing of medical insurance products.
翻译:医疗成本预测是一项具有挑战性的任务,原因在于协变量的高维性和强相关性。此外,成本数据往往呈现偏态、重尾及多模态分布特征,由于未观测到的异质性,进一步加剧了复杂性。本研究提出一种新的有限混合回归模型框架,该框架整合协变量聚类方法,以更好地刻画聚类协变量对结果子组的影响,从而更精确地描述数据的复杂分布。所提框架可表述为在协变量先验相似性基础上引入额外惩罚项的凸优化问题。为高效求解该优化问题,我们提出一种专门的EM-ADMM算法,该算法将交替方向乘子法(ADMM)融入期望最大化(EM)算法的迭代过程。通过模拟数据验证了算法的收敛性与协变量聚类方法的效率,并利用两个真实中国医疗支出数据集证明了该方法相较于传统回归技术的优越性。我们的实证结果为协变量的复杂网络图提供了宝贵见解,可指导如医疗保险产品设计与定价等商业实践。