The Nested Dirichlet Distribution (NDD) provides a flexible alternative to the Dirichlet distribution for modeling compositional data, relaxing constraints on component variances and correlations through a hierarchical tree structure. While theoretically appealing, the NDD is underused in practice due to two main limitations: the need to predefine the tree structure and the lack of diagnostics for evaluating model fit. This paper addresses both issues. First, we introduce a data-driven, greedy tree-finding algorithm that identifies plausible NDD tree structures from observed data. Second, we propose novel diagnostic tools, including pseudo-residuals based on a saddlepoint approximation to the marginal distributions and a likelihood displacement measure to detect influential observations. These tools provide accurate and computationally tractable assessments of model fit, even when marginal distributions are analytically intractable. We demonstrate our approach through simulation studies and apply it to data from a Morris water maze experiment, where the goal is to detect differences in spatial learning strategies among cognitively impaired and unimpaired mice. Our methods yield interpretable structures and improved model evaluation in a realistic compositional setting. An accompanying R package is provided to support reproducibility and application to new datasets.
翻译:嵌套狄利克雷分布(NDD)为成分数据的建模提供了比狄利克雷分布更灵活的替代方案,通过分层树状结构放宽了对成分方差与相关性的约束。尽管在理论上具有吸引力,NDD在实际应用中仍使用不足,主要受限于两个因素:需要预先定义树状结构,以及缺乏评估模型拟合度的诊断工具。本文针对这两个问题提出了解决方案。首先,我们提出了一种数据驱动的贪心树结构搜索算法,能够从观测数据中识别合理的NDD树状结构。其次,我们开发了新的诊断工具,包括基于边缘分布鞍点近似的伪残差和用于检测强影响观测的似然位移度量。这些工具即使在边缘分布解析不可行的情况下,也能提供精确且计算可行的模型拟合度评估。我们通过模拟研究验证了所提方法的有效性,并将其应用于莫里斯水迷宫实验数据,该实验旨在检测认知受损与未受损小鼠在空间学习策略上的差异。我们的方法在真实的成分数据场景中产生了可解释的结构并提升了模型评估效果。本文同时提供了配套的R软件包以支持方法复现及在新数据集上的应用。