Compositional data find broad application across diverse fields due to their efficacy in representing proportions or percentages of various components within a whole. Spatial dependencies often exist in compositional data, particularly when the data represents different land uses or ecological variables. Ignoring the spatial autocorrelations in modelling of compositional data may lead to incorrect estimates of parameters. Hence, it is essential to incorporate spatial information into the statistical analysis of compositional data to obtain accurate and reliable results. However, traditional statistical methods are not directly applicable to compositional data due to the correlation between its observations, which are constrained to lie on a simplex. To address this challenge, the Dirichlet distribution is commonly employed, as its support aligns with the nature of compositional vectors. Specifically, the R package DirichletReg provides a regression model, termed Dirichlet regression, tailored for compositional data. However, this model fails to account for spatial dependencies, thereby restricting its utility in spatial contexts. In this study, we introduce a novel spatial autoregressive Dirichlet regression model for compositional data, adeptly integrating spatial dependencies among observations. We construct a maximum likelihood estimator for a Dirichlet density function augmented with a spatial lag term. We compare this spatial autoregressive model with the same model without spatial lag, where we test both models on synthetic data as well as two real datasets, using different metrics. By considering the spatial relationships among observations, our model provides more accurate and reliable results for the analysis of compositional data. The model is further evaluated against a spatial multinomial regression model for compositional data, and their relative effectiveness is discussed.
翻译:成分数据因其能有效表示整体中各组成部分的比例或百分比,广泛应用于多个领域。成分数据常存在空间依赖性,尤其当数据代表不同土地利用或生态变量时。在成分数据建模中忽略空间自相关可能导致参数估计错误。因此,在成分数据的统计分析中纳入空间信息以获得准确可靠的结果至关重要。然而,由于成分数据观测值之间存在相关性且受限于单纯形,传统统计方法无法直接适用。为应对这一挑战,狄利克雷分布因其支撑集与成分向量的性质一致而被广泛采用。具体而言,R包DirichletReg提供了针对成分数据的回归模型(称为狄利克雷回归)。然而,该模型未能考虑空间依赖性,从而限制了其在空间场景中的实用性。本研究提出了一种适用于成分数据的空间自回归狄利克雷回归模型,能够巧妙整合观测值之间的空间依赖性。我们为包含空间滞后项的狄利克雷密度函数构建了极大似然估计量,并将此空间自回归模型与无空间滞后的同类模型进行对比——通过不同指标在合成数据及两个真实数据集上检验两种模型。通过考虑观测值间的空间关系,我们的模型为成分数据分析提供了更准确可靠的结果。进一步,我们将该模型与成分数据的空间多项逻辑回归模型进行比较,并讨论了其相对有效性。