Statistical description and dimension reduction of continuous time categorical trajectories with multivariate functional principal components

Getting tools that allow simple representations and comparisons of a set of categorical trajectories is of major interest for statisticians. Without loosing any information, we associate to each state a binary random indicator function, taking values in $\{0,1\}$, and turn the problem of statistical description of the categorical trajectories into a multivariate functional principal components analysis. This viewpoint encompasses experimental frameworks where two or more states can be observed simultaneously. The sample paths being piecewise constant, with a finite number of jumps, this a rare case in functional data analysis in which the trajectories are not supposed to be continuous and can be observed exhaustively. Under the weak hypothesis assuming only continuity in probability of the $0-1$ trajectories, the means and the (multivariate) covariance function are continuous and have interpretations in terms of departure from independence of the joint probabilities. Considering a functional data point of view, we show that the binary trajectories, which are right-continuous functions with left-hand limits, can be seen as random elements in the Hilbert space of square integrable functions. The multivariate functional principal components are simple to interpret and we show that we can get consistent estimators of the mean trajectories and the covariance functions under weak regularity assumptions. The ability of the approach to represent categorical trajectories in a small dimension space is illustrated on a data set of sensory perceptions, considering different gustometer-controlled stimuli experiments.

翻译：获取能够对分类轨迹集合进行简单表示和比较的工具，对于统计学家而言具有重大意义。在不丢失任何信息的前提下，我们将每个状态与一个取值为 $\{0,1\}$ 的二元随机指示函数相关联，从而将分类轨迹的统计描述问题转化为多元函数主成分分析。这一观点涵盖了可以同时观测到两个或更多状态的实验框架。由于样本路径是分段常数函数且具有有限个跳跃点，这是函数型数据分析中一个罕见的情形——轨迹不必假设为连续且可被详尽观测。在仅假设 $0-1$ 轨迹具有概率连续性的弱假设下，均值函数与（多元）协方差函数是连续的，并可从联合概率偏离独立性的角度进行解释。基于函数型数据的视角，我们证明了这些作为右连续左极限函数的二元轨迹，可被视为平方可积函数希尔伯特空间中的随机元。多元函数主成分易于解释，并且我们证明了在弱正则性假设下，能够获得均值轨迹与协方差函数的一致估计量。该方法在低维空间中表示分类轨迹的能力，通过一个考虑不同味觉刺激控制实验的感官感知数据集得到了展示。