Compositional Data Analysis (CoDa) has gained popularity in recent years. This type of data consists of values from disjoint categories that sum up to a constant. Both Dirichlet regression and logistic-normal regression have become popular as CoDa analysis methods. However, fitting this kind of multivariate models presents challenges, especially when structured random effects are included in the model, such as temporal or spatial effects. To overcome these challenges, we propose the logistic-normal Dirichlet Model (LNDM). We seamlessly incorporate this approach into the R-INLA package, facilitating model fitting and model prediction within the framework of Latent Gaussian Models (LGMs). Moreover, we explore metrics like Deviance Information Criteria (DIC), Watanabe Akaike information criterion (WAIC), and cross-validation measure conditional predictive ordinate (CPO) for model selection in R-INLA for CoDa. Illustrating LNDM through a simple simulated example and with an ecological case study on Arabidopsis thaliana in the Iberian Peninsula, we underscore its potential as an effective tool for managing CoDa and large CoDa databases.
翻译:成分数据分析(CoDa)近年来日益受到关注。这类数据由互斥类别的取值构成,其总和为常数。Dirichlet回归与对数正态回归已成为CoDa分析的常用方法。然而,拟合此类多元模型面临挑战,尤其当模型中包含结构化随机效应(如时间或空间效应)时。为应对这些挑战,我们提出对数正态Dirichlet模型(LNDM)。通过将该方法无缝集成到R-INLA软件包中,我们能够在潜在高斯模型(LGM)框架内实现模型拟合与预测。此外,我们探讨了偏差信息准则(DIC)、渡边-赤池信息准则(WAIC)以及交叉验证度量条件预测纵坐标(CPO)等指标,用于R-INLA中CoDa的模型选择。通过一个简单模拟示例和伊比利亚半岛拟南芥生态案例研究,我们展示了LNDM作为管理CoDa及大规模CoDa数据库的有效工具的潜力。