Compositional Data Analysis (CoDa) has gained popularity in recent years. This type of data consists of values from disjoint categories that sum up to a constant. Both Dirichlet regression and logistic-normal regression have become popular as CoDa analysis methods. However, fitting this kind of multivariate models presents challenges, especially when structured random effects are included in the model, such as temporal or spatial effects. To overcome these challenges, we propose the logistic-normal Dirichlet Model (LNDM). We seamlessly incorporate this approach into the \textbf{R-INLA} package, facilitating model fitting, model and model predicting within the framework of Latent Gaussian Models (LGMs). Moreover, we explore metrics like Deviance Information Criteria (DIC), Watanabe Akaike information criterion (WAIC), and cross-validation measure conditional predictive ordinate (CPO) for model selection in \textbf{R-INLA} for CoDa. Illustrating LNDM through a simple simulated example and with an ecological case study on \textit{Arabidopsis thaliana} in the Iberian Peninsula, we underscore its potential as an effective tool for managing CoDa and large CoDa databases.
翻译:成分数据分析(CoDa)近年来越来越受欢迎。这类数据由来自不重叠类别的数值组成,其总和为常数。狄利克雷回归和对数正态回归已成为流行的CoDa分析方法。然而,拟合这种多变量模型存在挑战,尤其是当模型中包含结构化随机效应(如时间或空间效应)时。为克服这些挑战,我们提出了对数正态狄利克雷模型(LNDM)。我们将该方法无缝集成到\textbf{R-INLA}包中,便于在潜高斯模型(LGM)框架内进行模型拟合、模型选择及预测。此外,我们在\textbf{R-INLA}中针对CoDA探索了用于模型选择的指标,如偏差信息准则(DIC)、渡边-赤池信息准则(WAIC)以及基于交叉验证的条件预测纵坐标(CPO)。通过一个简单的模拟示例以及伊比利亚半岛拟南芥(\textit{Arabidopsis thaliana})的生态案例研究,我们展示了LNDM作为管理CoDA及大规模CoDA数据库的有效工具的潜力。