An old problem in multivariate statistics is that linear Gaussian models are often unidentifiable, i.e. some parameters cannot be uniquely estimated. In factor (component) analysis, an orthogonal rotation of the factors is unidentifiable, while in linear regression, the direction of effect cannot be identified. For such linear models, non-Gaussianity of the (latent) variables has been shown to provide identifiability. In the case of factor analysis, this leads to independent component analysis, while in the case of the direction of effect, non-Gaussian versions of structural equation modelling solve the problem. More recently, we have shown how even general nonparametric nonlinear versions of such models can be estimated. Non-Gaussianity is not enough in this case, but assuming we have time series, or that the distributions are suitably modulated by some observed auxiliary variables, the models are identifiable. This paper reviews the identifiability theory for the linear and nonlinear cases, considering both factor analytic models and structural equation models.
翻译:多元统计学中的一个经典问题是线性高斯模型通常不可识别,即某些参数无法被唯一估计。在因子(成分)分析中,因子的正交旋转不可识别;而在线性回归中,效应方向同样无法识别。对于此类线性模型,研究已表明(潜)变量的非高斯性能够提供可识别性。在因子分析中,这导致了独立成分分析的诞生;而在效应方向问题上,非高斯版本的结构方程建模解决了该问题。近期研究进一步证明,即使是此类模型的通用非参数非线性版本也可被估计。在此情形下,非高斯性并不足够,但若假设存在时间序列数据,或分布可被某些观测到的辅助变量适当调节,模型仍具有可识别性。本文回顾了线性与非线性情形下因子分析模型与结构方程模型的可识别性理论。