Dirichlet regression models are suitable for compositional data, in which the response variable represents proportions that sum to one. However, there are still no well-established methods for constructing valid prediction sets in this context, especially considering the geometry of the compositional space. In this work, we investigate conformal prediction-based strategies for constructing valid predictive regions in Dirichlet regression models. We evaluate three distinct approaches: a method based on quantile residuals, an approximate construction of highest density regions (HDR), and an adaptation of the approximate HDR using grid-based discretization over the simplex. The performance of the methods was analyzed through simulation studies under different scenarios, varying the model complexity, response dimensionality, and covariate structure. The results indicated that the HDR approximation approach exhibits good robustness in terms of coverage, while the grid discretization proved effective in reducing overcoverage and the area of the prediction region compared to the original method. The quantile method provided larger prediction regions compared to the grid method, while maintaining adequate coverage. The methodologies were also applied to two real datasets: one concerning sleep stages and another on biomass allocation in plants. In both cases, the proposed methods demonstrated practical feasibility and produced coherent interpretations within the compositional space. Finally, we discuss possible extensions of this work
翻译:狄利克雷回归模型适用于组合数据,其中响应变量表示总和为一的比例。然而,在此背景下,尤其是在考虑组合空间几何结构的情况下,目前仍缺乏成熟的构建有效预测集的方法。本研究探讨了基于保形预测的策略,用于在狄利克雷回归模型中构建有效的预测区域。我们评估了三种不同的方法:一种基于分位数残差的方法、一种最高密度区域(HDR)的近似构建方法,以及一种利用单纯形上基于网格的离散化对近似HDR进行的改进方法。通过在不同情景下的模拟研究,分析了这些方法的性能,这些情景变化了模型复杂度、响应变量维度和协变量结构。结果表明,HDR近似方法在覆盖率方面表现出良好的鲁棒性,而网格离散化方法相较于原始方法,在减少过度覆盖和缩小预测区域面积方面被证明是有效的。分位数方法相较于网格方法提供了更大的预测区域,同时保持了足够的覆盖率。这些方法还被应用于两个真实数据集:一个涉及睡眠阶段,另一个关于植物生物量分配。在这两种情况下,所提出的方法都展示了实际可行性,并在组合空间内产生了一致的解释。最后,我们讨论了本工作可能的扩展方向。