Multi-way data extend two-way matrices into higher-dimensional tensors, often explored through dimensional reduction techniques. In this paper, we study the Parallel Factor Analysis (PARAFAC) model for handling multi-way data, representing it more compactly through a concise set of loading matrices and scores. We assume that the data may be incomplete and could contain both rowwise and cellwise outliers, signifying cases that deviate from the majority and outlying cells dispersed throughout the data array. To address these challenges, we present a novel algorithm designed to robustly estimate both loadings and scores. Additionally, we introduce an enhanced outlier map to distinguish various patterns of outlying behavior. Through simulations and the analysis of fluorescence Excitation-Emission Matrix (EEM) data, we demonstrate the robustness of our approach. Our results underscore the effectiveness of diagnostic tools in identifying and interpreting unusual patterns within the data.
翻译:摘要:多路数据将二维矩阵扩展至更高维张量,常通过降维技术进行探索。本文研究处理多路数据的平行因子分析(PARAFAC)模型,通过一组简洁的载荷矩阵和得分对其实现更紧凑的表示。我们假设数据可能不完整,且同时包含行向异常值(偏离主流的个体案例)和单元异常值(散布于数据阵列中的离群单元)。针对上述挑战,我们提出一种新型算法以稳健估计载荷与得分。此外,我们引入增强型异常值图谱,用于区分不同模式的异常行为。通过仿真实验及对荧光激发-发射矩阵(EEM)数据的分析,我们验证了所提方法的稳健性。研究结果凸显了诊断工具在识别与阐释数据中非常规模式方面的有效性。