We develop methods, based on extreme value theory, for analysing observations in the tails of longitudinal data, i.e., a data set consisting of a large number of short time series, which are typically irregularly and non-simultaneously sampled, yet have some commonality in the structure of each series and exhibit independence between time series. Extreme value theory has not been considered previously for the unique features of longitudinal data. Across time series the data are assumed to follow a common generalised Pareto distribution, above a high threshold. To account for temporal dependence of such data we require a model to describe (i) the variation between the different time series properties, (ii) the changes in distribution over time, and (iii) the temporal dependence within each series. Our methodology has the flexibility to capture both asymptotic dependence and asymptotic independence, with this characteristic determined by the data. Bayesian inference is used given the need for inference of parameters that are unique to each time series. Our novel methodology is illustrated through the analysis of data from elite swimmers in the men's 100m breaststroke. Unlike previous analyses of personal-best data in this event, we are able to make inference about the careers of individual swimmers - such as the probability an individual will break the world record or swim the fastest time next year.
翻译:我们基于极值理论开发了用于分析纵向数据尾部观测值的方法。纵向数据集由大量短时间序列构成,这些序列通常以非规则、非同步方式采样,但各序列在结构上具有共性,且序列间相互独立。极值理论此前从未被用于处理纵向数据的独特特征。我们假设跨时间序列的数据在高阈值之上服从共同的广义帕累托分布。为刻画此类数据的时间依赖性,需要建立模型描述:(i)不同时间序列属性间的变异性,(ii)分布随时间的变化,以及(iii)每个序列内部的时间依赖性。该方法具有捕捉渐近依赖性与渐近独立性的灵活性,具体特征由数据决定。鉴于需要对每个时间序列特有的参数进行推断,本文采用贝叶斯推断。通过分析男子100米蛙泳精英游泳运动员的数据,我们展示了这一新颖方法的应用。与以往对该项目个人最好成绩数据的分析不同,我们能够对个体游泳运动员的职业生涯进行推断——例如运动员打破世界纪录的概率或明年游出最快成绩的概率。