Factor Analysis based on multivariate $t$ distribution ($t$fa) is a useful robust tool for extracting common factors on heavy-tailed or contaminated data. However, $t$fa is only applicable to vector data. When $t$fa is applied to matrix data, it is common to first vectorize the matrix observations. This introduces two challenges for $t$fa: (i) the inherent matrix structure of the data is broken, and (ii) robustness may be lost, as vectorized matrix data typically results in a high data dimension, which could easily lead to the breakdown of $t$fa. To address these issues, starting from the intrinsic matrix structure of matrix data, a novel robust factor analysis model, namely bilinear factor analysis built on the matrix-variate $t$ distribution ($t$bfa), is proposed in this paper. The novelty is that it is capable to simultaneously extract common factors for both row and column variables of interest on heavy-tailed or contaminated matrix data. Two efficient algorithms for maximum likelihood estimation of $t$bfa are developed. Closed-form expression for the Fisher information matrix to calculate the accuracy of parameter estimates are derived. Empirical studies are conducted to understand the proposed $t$bfa model and compare with related competitors. The results demonstrate the superiority and practicality of $t$bfa. Importantly, $t$bfa exhibits a significantly higher breakdown point than $t$fa, making it more suitable for matrix data.
翻译:基于多元$t$分布的因子分析($t$fa)是一种在重尾或受污染数据中提取公共因子的有效稳健工具。然而,$t$fa仅适用于向量数据。当$t$fa应用于矩阵数据时,通常需先将矩阵观测值向量化。这给$t$fa带来了两个挑战:(i)数据固有的矩阵结构被破坏;(ii)稳健性可能丧失,因为向量化的矩阵数据通常维度较高,易导致$t$fa失效。为解决这些问题,本文从矩阵数据的内在矩阵结构出发,提出了一种新型稳健因子分析模型,即基于矩阵$t$分布的双线性因子分析($t$bfa)。其创新之处在于:该模型能够在重尾或受污染的矩阵数据上,同时提取行变量与列变量的公共因子。本文开发了两种用于$t$bfa最大似然估计的高效算法,推导了用于计算参数估计精度的Fisher信息矩阵的闭式表达式,并通过实证研究理解所提出的$t$bfa模型,并与相关竞争方法进行比较。结果表明$t$bfa具有优越性与实用性。值得注意的是,$t$bfa的崩溃点显著高于$t$fa,使其更适用于矩阵数据。