Distortion is a fundamental well-studied topic in dimension reduction papers, and intimately related with the underlying intrinsic dimension of a mapping of a high dimensional data set onto a lower dimension. In this paper, we study embedding distortions produced by Correspondence Analysis and its robust l1 variant Taxicab Correspondence analysis, which are visualization methods for contingency tables. For high dimensional data, distortions in Correspondence Analysis are contractions; while distortions in Taxicab Correspondence Analysis could be contractions or stretchings. This shows that Euclidean geometry is quite rigid, because of the orthogonality property; while Taxicab geometry is quite flexible, because the orthogonality property is replaced by the conjugacy property.
翻译:失真是降维研究中一个基础且被广泛探讨的课题,与高维数据集映射到低维空间时所隐含的内在维度紧密相关。本文研究了对应分析及其稳健的l1变体——出租车主成分分析产生的嵌入失真,这两种方法均为列联表的可视化工具。对于高维数据,对应分析中的失真表现为收缩;而出租车主成分分析中的失真既可能呈现收缩也可能呈现拉伸。这表明,由于正交性约束,欧氏几何具有高度刚性;而出租车几何中,正交性被共轭性替代,因此具有高度灵活性。