Time series clustering is a central machine learning task with applications in many fields. While the majority of the methods focus on real-valued time series, very few works consider series with discrete response. In this paper, the problem of clustering ordinal time series is addressed. To this aim, two novel distances between ordinal time series are introduced and used to construct fuzzy clustering procedures. Both metrics are functions of the estimated cumulative probabilities, thus automatically taking advantage of the ordering inherent to the series' range. The resulting clustering algorithms are computationally efficient and able to group series generated from similar stochastic processes, reaching accurate results even though the series come from a wide variety of models. Since the dynamic of the series may vary over the time, we adopt a fuzzy approach, thus enabling the procedures to locate each series into several clusters with different membership degrees. An extensive simulation study shows that the proposed methods outperform several alternative procedures. Weighted versions of the clustering algorithms are also presented and their advantages with respect to the original methods are discussed. Two specific applications involving economic time series illustrate the usefulness of the proposed approaches.
翻译:时间序列聚类是一项核心的机器学习任务,在多个领域具有广泛应用。尽管多数方法侧重于实值时间序列,但针对离散响应序列的研究尚不多见。本文研究了序数时间序列的聚类问题。为此,我们提出了两种新的序数时间序列距离,并基于它们构建了模糊聚类流程。这两种度量均为估计累积概率的函数,从而自动利用了序列取值范围中固有的排序特性。由此得到的聚类算法计算高效,能够将相似随机过程生成的序列归为一组,即使这些序列来自多种不同的模型,也能获得准确的结果。由于序列的动态特性可能随时间变化,我们采用了模糊方法,从而使算法能够将每个序列以不同的隶属度分配到多个聚类中。一项广泛的模拟研究表明,所提出的方法优于多种替代方案。此外,我们还介绍了这些聚类算法的加权版本,并讨论了它们相较于原始方法的优势。两项涉及经济时间序列的具体应用实例展示了所提出方法的实用性。