The starting point for much of multivariate analysis (MVA) is an $n\times p$ data matrix whose $n$ rows represent observations and whose $p$ columns represent variables. Some multivariate data sets, however, may be best conceptualized not as $n$ discrete $p$-variate observations, but as $p$ curves or functions defined on a common time interval. Here we introduce a framework for extending techniques of multivariate analysis to such settings. The proposed continuous-time multivariate analysis (CTMVA) framework rests on the assumption that the curves can be represented as linear combinations of basis functions such as $B$-splines, as in the Ramsay-Silverman representation of functional data; but whereas functional data analysis extends MVA to the case of observations that are curves rather than vectors -- heuristically, $n\times p$ data with $p$ infinite -- we are instead concerned with what happens when $n$ is infinite. We present continuous-time extensions of the classical MVA methods of covariance and correlation estimation, principal component analysis, Fisher's linear discriminant analysis, and $k$-means clustering. We show that CTMVA can improve on the performance of classical MVA, in particular for correlation estimation and clustering, and can be applied in some settings where classical MVA cannot, including variables observed at disparate time points. CTMVA is illustrated with a novel perspective on a well-known Canadian weather data set, and with applications to data sets involving international development, brain signals, and air quality. The proposed methods are implemented in the publicly available R package \texttt{ctmva}.
翻译:多元分析(MVA)的起点通常是一个 $n\times p$ 数据矩阵,其 $n$ 行代表观测值,$p$ 列代表变量。然而,某些多元数据集可能更适合被概念化为定义在共同时间区间上的 $p$ 条曲线或函数,而非 $n$ 个离散的 $p$ 变量观测值。本文引入一个框架,将多元分析技术扩展至此类场景。所提出的连续时间多元分析(CTMVA)框架基于以下假设:曲线可以表示为基函数(如 $B$ 样条)的线性组合,类似于函数型数据分析中 Ramsay-Silverman 的表示方法;但功能数据分析将 MVA 扩展到观测值是曲线而非向量的情况——直观上,即 $p$ 为无穷大的 $n\times p$ 数据——而我们关注的是当 $n$ 为无穷大时的情况。我们提出了经典 MVA 方法的连续时间扩展,包括协方差与相关性估计、主成分分析、Fisher 线性判别分析以及 $k$-均值聚类。我们证明,CTMVA 能够提升经典 MVA 的性能,特别是在相关性估计和聚类方面,并且可应用于一些经典 MVA 无法处理的场景,包括变量在不同时间点观测的情况。CTMVA 通过一个对知名加拿大天气数据集的新视角,以及在国际发展、脑信号和空气质量数据集上的应用进行说明。所提出的方法已在公开可用的 R 包 \texttt{ctmva} 中实现。