Income inequality is a major contributor to health disparities, yet its effects often vary by geography and are commonly represented as compositional distributions (e.g., proportions of households across income brackets). Existing spatial regression methods struggle in this setting: they typically assume smooth spatial variation, cannot accommodate abrupt spatial heterogeneity, and lack principled treatment of compositional covariates. We propose a geographically weighted penalized compositional regression model that addresses these challenges simultaneously. Our method adopts a pairwise fusion penalty that enables detection of both contiguous and noncontiguous regional clusters with shared regression effects, thereby relaxing strong assumptions of spatial smoothness and geographic contiguity. This allows regions with similar underlying socioeconomic structures to be identified even when they are not geographically adjacent. By incorporating nonconvex penalties, such as the minimax concave penalty (MCP), the approach achieves improved estimation accuracy, interpretability, and scalability in high-dimensional spatial settings. We illustrate the method through an analysis linking U.S. income composition to chronic obstructive pulmonary disease (COPD) prevalence, revealing spatially heterogeneous associations that are obscured by conventional models. The proposed framework provides a flexible and robust tool for spatial data analysis involving compositional predictors and region-specific heterogeneity.
翻译:收入不平等是导致健康差异的主要因素,但其影响常随地理区域变化,且通常表现为成分分布形式(如各收入阶层家庭占比)。现有空间回归方法在此场景下存在局限:它们通常假设空间变化平滑,无法适应突然的空间异质性,且缺乏对成分型协变量的合理处理。我们提出一种地理加权惩罚成分回归模型,可同步解决上述难题。该方法采用成对融合惩罚,能够同时检测具有共享回归效应的连续与非连续区域聚类,从而放宽空间平滑性和地理连续性的强假设。即使区域在地理上不相邻,也能识别具有相似社会经济基础结构的区域。通过引入非凸惩罚(如极小极大凹惩罚MCP),本方法在高维空间场景中实现了更高的估计精度、可解释性和可扩展性。我们通过分析美国收入构成与慢性阻塞性肺疾病患病率的关联验证该方法,揭示了传统模型无法捕捉的空间异质性关联。所提框架为涉及成分型预测变量和区域特异性异质性的空间数据分析提供了灵活稳健的工具。