The starting point for much of multivariate analysis (MVA) is an $n\times p$ data matrix whose $n$ rows represent observations and whose $p$ columns represent variables. Some multivariate data sets, however, may be best conceptualized not as $n$ discrete $p$-variate observations, but as $p$ curves or functions defined on a common time interval. We introduce a framework for extending techniques of multivariate analysis to such settings. The proposed framework rests on the assumption that the curves can be represented as linear combinations of basis functions such as B-splines. This is formally identical to the Ramsay-Silverman representation of functional data; but whereas functional data analysis extends MVA to the case of observations that are curves rather than vectors -- heuristically, $n\times p$ data with $p$ infinite -- we are instead concerned with what happens when $n$ is infinite. We describe how to translate the classical MVA methods of covariance and correlation estimation, principal component analysis, Fisher's linear discriminant analysis, and $k$-means clustering to the continuous-time setting. We illustrate the methods with a novel perspective on a well-known Canadian weather data set, and with applications to neurobiological and environmetric data. The methods are implemented in the publicly available R package \texttt{ctmva}.
翻译:多元分析(MVA)的出发点通常是一个$n \times p$的数据矩阵,其中$n$行代表观测值,$p$列代表变量。然而,某些多元数据集的最佳概念化方式并非$n$个离散的$p$变量观测值,而是定义在共同时间区间上的$p$条曲线或函数。我们引入了一个框架,用于将多元分析技术推广至此类场景。该框架基于一个假设:曲线可表示为基函数(如B样条)的线性组合。这在形式上与Ramsay-Silverman的函数型数据表示相同;但函数型数据分析将MVA扩展到观测值为曲线而非向量的情形——直观上即$n \times p$数据中$p$无限大——而我们的关注点则是当$n$无限大时会发生什么。我们描述了如何将经典的MVA方法(包括协方差与相关估计、主成分分析、Fisher线性判别分析及$k$-均值聚类)转化为连续时间形式。通过一个对知名加拿大气象数据集的新颖视角,以及神经生物学和环境计量学数据的应用示例,我们对这些方法进行了说明。这些方法已在公开可用的R语言包\texttt{ctmva}中实现。