Factor models have been widely used to summarize the variability of high-dimensional data through a set of factors with much lower dimensionality. Gaussian linear factor models have been particularly popular due to their interpretability and ease of computation. However, in practice, data often violate the multivariate Gaussian assumption. To characterize higher-order dependence and nonlinearity, models that include factors as predictors in flexible multivariate regression are popular, with GP-LVMs using Gaussian process (GP) priors for the regression function and VAEs using deep neural networks. Unfortunately, such approaches lack identifiability and interpretability and tend to produce brittle and non-reproducible results. To address these problems by simplifying the nonparametric factor model while maintaining flexibility, we propose the NIFTY framework, which parsimoniously transforms uniform latent variables using one-dimensional nonlinear mappings and then applies a linear generative model. The induced multivariate distribution falls into a flexible class while maintaining simple computation and interpretation. We prove that this model is identifiable and empirically study NIFTY using simulated data, observing good performance in density estimation and data visualization. We then apply NIFTY to bird song data in an environmental monitoring application.
翻译:因子模型已广泛用于通过一组低维因子来概括高维数据的变异性。高斯线性因子模型因其可解释性和易计算性而特别受欢迎。然而,在实践中,数据往往违反多元高斯假设。为了表征高阶依赖性和非线性,将因子作为预测变量纳入灵活多元回归的模型很流行,例如使用高斯过程(GP)先验回归函数的GP-LVM和利用深度神经网络的VAE。遗憾的是,这类方法缺乏可识别性和可解释性,且往往产生脆弱且不可重复的结果。为了在保持灵活性的同时简化非参数因子模型以解决这些问题,我们提出了NIFTY框架,该框架通过一维非线性映射简约地变换均匀潜变量,然后应用线性生成模型。由此产生的多元分布属于一个灵活类别,同时保持了简单的计算和解释性。我们证明了该模型是可识别的,并使用模拟数据对NIFTY进行了实证研究,观察到其在密度估计和数据可视化方面表现良好。随后,我们将NIFTY应用于环境监测应用中的鸟鸣数据。