Multivariate data having both continuous and discrete variables is known as mixed outcomes and has widely appeared in a variety of fields such as ecology, epidemiology, and climatology. In order to understand the probability structure of multivariate data, the estimation of the dependence structure among mixed outcomes is very important. However, when location information is equipped with multivariate data, the spatial correlation should be adequately taken into account; otherwise, the estimation of the dependence structure would be severely biased. To solve this issue, we propose a semiparametric Bayesian inference for the dependence structure among mixed outcomes while eliminating spatial correlation. To this end, we consider a hierarchical spatial model based on the rank likelihood and a latent multivariate Gaussian process. We develop an efficient algorithm for computing the posterior using the Markov Chain Monte Carlo. We also provide a scalable implementation of the model using the nearest-neighbor Gaussian process under large spatial datasets. We conduct a simulation study to validate our proposed procedure and demonstrate that the procedure successfully accounts for spatial correlation and correctly infers the dependence structure among outcomes. Furthermore, the procedure is applied to a real example collected during an international synoptic krill survey in the Scotia Sea of the Antarctic Peninsula, which includes sighting data of fin whales (Balaenoptera physalus), and the relevant oceanographic data.
翻译:兼具连续变量与离散变量的多变量数据被称为混合结果,在生态学、流行病学及气候学等多个领域广泛存在。为理解多变量数据的概率结构,估计混合结果间的依赖结构至关重要。然而,当多变量数据包含空间定位信息时,必须充分考虑空间相关性,否则依赖结构的估计将产生严重偏差。针对该问题,本文提出一种半参数贝叶斯推断方法,在消除空间相关性的前提下估计混合结果间的依赖结构。为此,我们构建了基于秩似然和潜在多变量高斯过程的分层空间模型,开发了基于马尔可夫链蒙特卡洛方法的后验分布高效计算算法,并利用近邻高斯过程实现大空间数据集下模型的可扩展计算。通过模拟研究验证了所提方法的有效性,结果表明该方法能够成功控制空间相关性并准确推断结果间的依赖结构。此外,我们将该方法应用于南极半岛斯科舍海国际同步磷虾调查中收集的真实数据(包含长须鲸Balaenoptera physalus目击数据及相关海洋学数据)进行分析。