The approach to analysing compositional data has been dominated by the use of logratio transformations, to ensure exact subcompositional coherence and, in some situations, exact isometry as well. A problem with this approach is that data zeros, found in most applications, have to be replaced to allow the logarithmic transformation. An alternative new approach, called the `chiPower' transformation, which allows data zeros, is to combine the standardization inherent in the chi-square distance in correspondence analysis, with the essential elements of the Box-Cox power transformation. The chiPower transformation is justified because it} defines between-sample distances that tend to logratio distances for strictly positive data as the power parameter tends to zero, and are then equivalent to transforming to logratios. For data with zeros, a value of the power can be identified that brings the chiPower transformation as close as possible to a logratio transformation, without having to substitute the zeros. Especially in the area of high-dimensional data, this alternative approach can present such a high level of coherence and isometry as to be a valid approach to the analysis of compositional data. Furthermore, in a supervised learning context, if the compositional variables serve as predictors of a response in a modelling framework, for example generalized linear models, then the power can be used as a tuning parameter in optimizing the accuracy of prediction through cross-validation. The chiPower-transformed variables have a straightforward interpretation, since they are each identified with single compositional parts, not ratios.
翻译:成分数据分析主要通过使用对数比变换来确保精确的子成分一致性,并在某些情况下实现精确的等距性。这种方法的一个问题是,大多数应用中存在的零值数据必须被替换才能进行对数变换。一种名为"chiPower变换"的替代新方法允许处理零值数据,它将对应分析中卡方距离的标准化特性与Box-Cox幂变换的核心要素相结合。chiPower变换的合理性在于,当幂参数趋近于零时,其定义的样本间距离趋近于严格正值数据的对数比距离,并因此等价于对数比变换。对于含零值的数据,可以识别出一个幂值,使chiPower变换尽可能接近对数比变换,而无需替换零值。特别是在高维数据领域,这种替代方法能够展现出极高的一致性和等距性,从而成为成分数据分析的有效方法。此外,在有监督学习背景下,若成分变量作为建模框架(例如广义线性模型)中响应的预测因子,则幂参数可作为调优参数,通过交叉验证优化预测精度。经过chiPower变换后的变量具有直观的解释性,因为它们各自对应单一成分部分,而非比值。