The starting point for much of multivariate analysis (MVA) is an $n\times p$ data matrix whose $n$ rows represent observations and whose $p$ columns represent variables. Some multivariate data sets, however, may be best conceptualized not as $n$ discrete $p$-variate observations, but as $p$ curves or functions defined on a common time interval. We introduce a framework for extending techniques of multivariate analysis to such settings. The proposed framework rests on the assumption that the curves can be represented as linear combinations of basis functions such as B-splines. This is formally identical to the Ramsay-Silverman representation of functional data; but whereas functional data analysis extends MVA to the case of observations that are curves rather than vectors -- heuristically, $n\times p$ data with $p$ infinite -- we are instead concerned with what happens when $n$ is infinite. We describe how to translate the classical MVA methods of covariance and correlation estimation, principal component analysis, Fisher's linear discriminant analysis, and $k$-means clustering to the continuous-time setting. We illustrate the methods with a novel perspective on a well-known Canadian weather data set, and with applications to neurobiological and environmetric data. The methods are implemented in the publicly available R package \texttt{ctmva}.
翻译:多变量分析(MVA)的出发点通常是一个$n\times p$的数据矩阵,其中$n$行代表观测值,$p$列代表变量。然而,某些多变量数据集可能更适宜被概念化为$p条$定义在共同时间区间上的曲线或函数,而非$n$个离散的$p$元观测值。我们提出一个框架,将多变量分析技术推广至此类情景。该框架基于一个假设:这些曲线可表示为基函数(如B样条)的线性组合。这在形式上与Ramsay-Silverman的函数型数据表示一致;但函数型数据分析将MVA推广至观测值为曲线而非向量的情形——启发式地理解为$n\times p$数据中$p$为无穷大——而我们关注的则是当$n$为无穷大时的情形。我们描述了如何将经典的MVA方法(协方差与相关估计、主成分分析、Fisher线性判别分析及$k$-均值聚类)转化至连续时间情景。我们以加拿大著名气象数据集的新视角,以及神经生物学和环境计量学数据的应用为例,展示了这些方法。所述方法已实现于公开可用的R包\texttt{ctmva}中。