Compositional Data Analysis (CoDa) has gained popularity in recent years. This type of data consists of values from disjoint categories that sum up to a constant. Both Dirichlet regression and logistic-normal regression have become popular as CoDa analysis methods. However, fitting this kind of multivariate models presents challenges, especially when structured random effects are included in the model, such as temporal or spatial effects. To overcome these challenges, we propose the logistic-normal Dirichlet Model (LNDM). We seamlessly incorporate this approach into the R-INLA package, facilitating model fitting and model prediction within the framework of Latent Gaussian Models (LGMs). Moreover, we explore metrics like Deviance Information Criteria (DIC), Watanabe Akaike information criterion (WAIC), and cross-validation measure conditional predictive ordinate (CPO) for model selection in R-INLA for CoDa. Illustrating LNDM through a simple simulated example and with an ecological case study on Arabidopsis thaliana in the Iberian Peninsula, we underscore its potential as an effective tool for managing CoDa and large CoDa databases.
翻译:成分数据分析(CoDa)近年来日益受到关注。这类数据由若干互斥类别的数值构成,且这些数值之和为常数。狄利克雷回归与对态正态回归已成为CoDA分析的流行方法。然而,拟合这类多元模型面临诸多挑战,尤其是当模型中包含结构化随机效应(如时间或空间效应)时。为克服这些困难,我们提出对态正态狄利克雷模型(LNDM)。我们将该方法无缝整合至R-INLA软件包中,从而在潜高斯模型(LGM)框架下实现模型拟合与预测。此外,我们探讨了用于CoDa模型选择的评价指标,包括偏差信息准则(DIC)、渡边-赤池信息准则(WAIC)以及基于条件预测坐标(CPO)的交叉验证度量。通过简单模拟示例及伊比利亚半岛拟南芥生态案例研究,我们验证了LNDM作为管理CoDa及大规模CoDa数据库有效工具的潜力。