Compositional data are characterized by the fact that their elemental information is contained in simple pairwise logratios of the parts that constitute the composition. While pairwise logratios are typically easy to interpret, the number of possible pairs to consider quickly becomes (too) large even for medium-sized compositions, which might hinder interpretability in further multivariate analyses. Sparse methods can therefore be useful to identify few, important pairwise logratios (respectively parts contained in them) from the total candidate set. To this end, we propose a procedure based on the construction of all possible pairwise logratios and employ sparse principal component analysis to identify important pairwise logratios. The performance of the procedure is demonstrated both with simulated and real-world data. In our empirical analyses, we propose three visual tools showing (i) the balance between sparsity and explained variability, (ii) stability of the pairwise logratios, and (iii) importance of the original compositional parts to aid practitioners with their model interpretation.
翻译:成分数据的特点是其元素信息蕴含于构成成分的各部分之间的简单成对比率中。尽管成对比率通常易于解释,但即使是中等规模的成分数据,需要考虑的成对比率数量也会迅速变得(过于)庞大,这可能阻碍进一步多变量分析中的可解释性。因此,稀疏方法可用于从总候选集中识别少数重要成对比率(及其包含的部分)。为此,我们提出了一种基于构建所有可能成对比率的流程,并采用稀疏主成分分析识别重要成对比率。通过模拟数据和真实数据验证了该方法的性能。在实证分析中,我们提出了三种可视化工具,分别展示:(i)稀疏性与解释方差之间的平衡,(ii)成对比率的稳定性,以及(iii)原始成分部分的重要性,以辅助实践者进行模型解释。