Compositional data arise in many real-life applications and versatile methods for properly analyzing this type of data in the regression context are needed. When parametric assumptions do not hold or are difficult to verify, non-parametric regression models can provide a convenient alternative method for prediction. To this end, we consider an extension to the classical $k$--$NN$ regression, termed $\alpha$--$k$--$NN$ regression, that yields a highly flexible non-parametric regression model for compositional data through the use of the $\alpha$-transformation. Unlike many of the recommended regression models for compositional data, zeros values (which commonly occur in practice) are not problematic and they can be incorporated into the proposed models without modification. Extensive simulation studies and real-life data analyses highlight the advantage of using these non-parametric regressions for complex relationships between the compositional response data and Euclidean predictor variables. Both suggest that $\alpha$--$k$--$NN$ regression can lead to more accurate predictions compared to current regression models which assume a, sometimes restrictive, parametric relationship with the predictor variables. In addition, the $\alpha$--$k$--$NN$ regression, in contrast to current regression techniques, enjoys a high computational efficiency rendering it highly attractive for use with large scale, massive, or big data.
翻译:成分数据出现在许多实际应用中,需要开发适用于回归背景下正确分析此类数据的通用方法。当参数假设不成立或难以验证时,非参数回归模型可提供一种便捷的预测替代方法。为此,我们扩展了经典的$k$--$NN$回归,提出名为$\alpha$--$k$--$NN$回归的方法,通过利用$\alpha$变换为成分数据构建高度灵活的非参数回归模型。与许多推荐的成分数据回归模型不同,零值(实际中常见)不会造成问题,无需修改即可直接纳入所提出的模型中。大量模拟研究和实际数据分析表明,在处理成分响应数据与欧几里得预测变量之间的复杂关系时,使用这些非参数回归具有优势。两者均显示,与当前假设预测变量间存在某种(有时具有限制性的)参数关系的回归模型相比,$\alpha$--$k$--$NN$回归能实现更精确的预测。此外,与现有回归技术相比,$\alpha$--$k$--$NN$回归具有较高的计算效率,使其在大规模、海量或大数据应用中极具吸引力。