Binary time series data are very common in many applications, and are typically modelled independently via a Bernoulli process with a single probability of success. However, the probability of a success can be dependent on the outcome successes of past events. Presented here is a novel approach for modelling binary time series data called a binary de Bruijn process which takes into account temporal correlation. The structure is derived from de Bruijn Graphs - a directed graph, where given a set of symbols, V, and a 'word' length, m, the nodes of the graph consist of all possible sequences of V of length m. De Bruijn Graphs are equivalent to mth order Markov chains, where the 'word' length controls the number of states that each individual state is dependent on. This increases correlation over a wider area. To quantify how clustered a sequence generated from a de Bruijn process is, the run lengths of letters are observed along with run length properties. Inference is also presented along with two application examples: precipitation data and the Oxford and Cambridge boat race.
翻译:二元时间序列数据在许多应用中非常常见,通常通过具有单一成功概率的伯努利过程独立建模。然而,成功的概率可能依赖于过去事件的成功结果。本文提出了一种名为二元德布鲁因过程的新方法,用于建模考虑时间相关性的二元时间序列数据。其结构源自德布鲁因图——一种有向图,其中给定一组符号V和“单词”长度m,图的节点由所有长度为m的V序列组成。德布鲁因图等价于m阶马尔可夫链,其中“单词”长度控制每个独立状态所依赖的状态数量,从而在更广范围内增强相关性。为量化德布鲁因过程生成序列的聚类程度,我们观察字母的运行长度及其性质。同时给出推断方法及两个应用实例:降水数据与牛津剑桥赛艇比赛。