Multi-way data extend two-way matrices into higher-dimensional tensors, often explored through dimensional reduction techniques. In this paper, we study the Parallel Factor Analysis (PARAFAC) model for handling multi-way data, representing it more compactly through a concise set of loading matrices and scores. We assume that the data may be incomplete and could contain both rowwise and cellwise outliers, signifying cases that deviate from the majority and outlying cells dispersed throughout the data array. To address these challenges, we present a novel algorithm designed to robustly estimate both loadings and scores. Additionally, we introduce an enhanced outlier map to distinguish various patterns of outlying behavior. Through simulations and the analysis of fluorescence Excitation-Emission Matrix (EEM) data, we demonstrate the robustness of our approach. Our results underscore the effectiveness of diagnostic tools in identifying and interpreting unusual patterns within the data.
翻译:多路数据将二维矩阵扩展至高维张量,通常通过降维技术进行分析。本文研究用于处理多路数据的并行因子分析(PARAFAC)模型,通过一组简洁的载荷矩阵与得分实现更紧凑的数据表示。我们假设数据可能不完整,且同时包含行向与单元异常值——前者指偏离主体分布的完整观测行,后者指分散在数据阵列中的异常单元格。为应对这些挑战,我们提出一种新颖算法,能够稳健地估计载荷与得分。此外,我们引入增强型异常值分布图以区分不同类型的异常行为模式。通过模拟实验与荧光激发-发射矩阵(EEM)数据分析,验证了所提方法的鲁棒性。研究结果突显了诊断工具在识别与解释数据异常模式方面的有效性。