An old problem in multivariate statistics is that linear Gaussian models are often unidentifiable, i.e. some parameters cannot be uniquely estimated. In factor analysis, an orthogonal rotation of the factors is unidentifiable, while in linear regression, the direction of effect cannot be identified. For such linear models, non-Gaussianity of the (latent) variables has been shown to provide identifiability. In the case of factor analysis, this leads to independent component analysis, while in the case of the direction of effect, non-Gaussian versions of structural equation modelling solve the problem. More recently, we have shown how even general nonparametric nonlinear versions of such models can be estimated. Non-Gaussianity is not enough in this case, but assuming we have time series, or that the distributions are suitably modulated by some observed auxiliary variables, the models are identifiable. This paper reviews the identifiability theory for the linear and nonlinear cases, considering both factor analytic models and structural equation models.
翻译:多变量统计学中的一个老问题是,线性高斯模型通常不可识别,即某些参数无法被唯一估计。在因子分析中,因子的正交旋转是不可识别的;而在线性回归中,效应方向也无法被识别。对于此类线性模型,已有研究表明(潜)变量的非高斯性可提供可识别性。在因子分析情形下,这引出了独立成分分析;而在效应方向情形下,非高斯版本的结构方程建模解决了该问题。近期,我们进一步展示了即使此类模型的通用非参数非线性版本也能够被估计。此时非高斯性已不足以提供可识别性,但若假设我们拥有时间序列数据,或分布被某些观测到的辅助变量适当调制,则模型是可识别的。本文综述了线性和非线性情形下的可识别性理论,同时考虑因子分析模型与结构方程模型。