Correspondence analysis is a dimension reduction method for visualization of nonnegative data sets, in particular contingency tables ; but it depends on the marginals of the data set. Two transformations of the data have been proposed to render correspondence analysis row and column scales invariant : These two kinds of transformations change the initial form of the data set into a bistochastic form. The power transorfmation applied by Greenacre (2010) has one positive parameter. While the transormation applied by Mosteller (1968) and Goodman (1996) has (I+J) positive parameters, where the raw data is row and column scaled by the Sinkhorn (RAS or ipf) algorithm to render it bistochastic. Goodman (1996) named correspondence analsis of a bistochastic matrix marginal-free correspondence analysis. We discuss these two transformations, and further generalize Mosteller-Goodman approach.
翻译:对应分析是一种用于可视化非负数据集(尤其是列联表)的降维方法,但其结果依赖于数据集的边际分布。已有两种数据变换方法被提出以实现对应分析的行和列尺度不变性:这两种变换将原始数据集转换为双随机形式。Greenacre(2010)应用的幂变换包含一个正参数,而Mosteller(1968)和Goodman(1996)应用的变换包含(I+J)个正参数,其中通过Sinkhorn(RAS或ipf)算法对原始数据进行行和列缩放以使其成为双随机矩阵。Goodman(1996)将双随机矩阵的对应分析称为无边际对应分析。我们讨论了这两种变换,并进一步推广了Mosteller-Goodman方法。