Many analyses of multivariate data focus on evaluating the dependence between two sets of variables, rather than the dependence among individual variables within each set. Canonical correlation analysis (CCA) is a classical data analysis technique that estimates parameters describing the dependence between such sets. However, inference procedures based on traditional CCA rely on the assumption that all variables are jointly normally distributed. We present a semiparametric approach to CCA in which the multivariate margins of each variable set may be arbitrary, but the dependence between variable sets is described by a parametric model that provides low-dimensional summaries of dependence. While maximum likelihood estimation in the proposed model is intractable, we propose two estimation strategies: one using a pseudolikelihood for the model and one using a Markov chain Monte Carlo (MCMC) algorithm that provides Bayesian estimates and confidence regions for the between-set dependence parameters. The MCMC algorithm is derived from a multirank likelihood function, which uses only part of the information in the observed data in exchange for being free of assumptions about the multivariate margins. We apply the proposed Bayesian inference procedure to Brazilian climate data and monthly stock returns from the materials and communications market sectors.
翻译:许多多元数据分析关注的是评估两组变量之间的依赖关系,而非每组内部单个变量之间的依赖关系。典范相关分析(CCA)是一种经典数据分析技术,用于估计描述此类组间依赖关系的参数。然而,传统CCA的推断过程依赖于所有变量联合服从正态分布的假设。本文提出一种半参数CCA方法,其中每组变量的多元边际分布可以是任意的,但组间依赖关系由参数模型描述,该模型提供低维度的依赖关系汇总。尽管所提模型的最大似然估计难以实现,我们提出了两种估计策略:一种是基于模型伪似然的方法,另一种是使用马尔可夫链蒙特卡洛(MCMC)算法的贝叶斯估计方法,可为组间依赖参数提供估计值和置信域。MCMC算法源自多秩似然函数,该函数仅利用观测数据中的部分信息,以换取对多元边际分布免于假设的建模自由。我们将所提出的贝叶斯推断程序应用于巴西气候数据以及材料与通信市场板块的月度股票收益数据。