There is a need for new models for characterizing dependence in multivariate data. The multivariate Gaussian distribution is routinely used, but cannot characterize nonlinear relationships in the data. Most non-linear extensions tend to be highly complex; for example, involving estimation of a non-linear regression model in latent variables. In this article, we propose a relatively simple class of Ellipsoid-Gaussian multivariate distributions, which are derived by using a Gaussian linear factor model involving latent variables having a von Mises-Fisher distribution on a unit hyper-sphere. We show that the Ellipsoid-Gaussian distribution can flexibly model curved relationships among variables with lower-dimensional structures. Taking a Bayesian approach, we propose a hybrid of gradient-based geodesic Monte Carlo and adaptive Metropolis for posterior sampling. We derive basic properties and illustrate the utility of the Ellipsoid-Gaussian distribution on a variety of simulated and real data applications. An accompanying R package is also available.
翻译:多元数据依赖性刻画需要新模型。多元高斯分布虽被广泛使用,但无法描述数据中的非线性关系。大多数非线性扩展模型高度复杂,例如需在潜变量中估计非线性回归模型。本文提出一类相对简单的椭球-高斯多元分布,通过高斯线性因子模型导出,其中潜变量服从单位超球面上的冯·米塞斯-费舍尔分布。我们证明椭球-高斯分布能灵活建模具有低维结构的变量间曲线关系。采用贝叶斯方法,我们提出结合基于梯度的测地蒙特卡洛与自适应Metropolis的混合算法进行后验采样。本文推导了基本性质,并通过模拟与真实数据应用展示了椭球-高斯分布的实用性。同时提供配套R语言软件包。