Correspondence analysis (CA) is a popular technique to visualize the relationship between two categorical variables. CA uses the data from a two-way contingency table and is affected by the presence of outliers. The supplementary points method is a popular method to handle outliers. Its disadvantage is that the information from entire rows or columns is removed. However, outliers can be caused by cells only. In this paper, a reconstitution algorithm is introduced to cope with such cells. This algorithm can reduce the contribution of cells in CA instead of deleting entire rows or columns. Thus the remaining information in the row and column involved can be used in the analysis. The reconstitution algorithm is compared with two alternative methods for handling outliers, the supplementary points method and MacroPCA. It is shown that the proposed strategy works well.
翻译:对应分析(CA)是一种用于可视化两个分类变量之间关系的流行技术。CA利用双向列联表中的数据,并受到异常值存在的影响。补充点方法是处理异常值的常用方法,其缺点在于会移除整行或整列的信息。然而,异常值可能仅由单个单元格引起。本文引入了一种重构算法来处理此类单元格。该算法能够降低单元格在CA中的贡献,而非删除整行或整列。因此,所涉及行和列中的剩余信息仍可用于分析。将重构算法与两种处理异常值的替代方法(补充点方法和MacroPCA)进行了比较。结果表明,所提出的策略效果良好。