Time series data are ubiquitous nowadays. Whereas most of the literature on the topic deals with real-valued time series, categorical time series have received much less attention. However, the development of data mining techniques for this kind of data has substantially increased in recent years. The R package ctsfeatures offers users a set of useful tools for analyzing categorical time series. In particular, several functions allowing the extraction of well-known statistical features and the construction of illustrative graphs describing underlying temporal patterns are provided in the package. The output of some functions can be employed to perform traditional machine learning tasks including clustering, classification and outlier detection. The package also includes two datasets of biological sequences introduced in the literature for clustering purposes, as well as three interesting synthetic databases. In this work, the main characteristics of the package are described and its use is illustrated through various examples. Practitioners from a wide variety of fields could benefit from the valuable tools provided by ctsfeatures.
翻译:如今,时间序列数据无处不在。尽管该领域的大多数文献都涉及实值时间序列,但分类时间序列却受到较少关注。然而,近年来针对这类数据的数据挖掘技术开发显著增加。R包ctsfeatures为用户提供了一套分析分类时间序列的有用工具。具体而言,该包提供了多个函数,用于提取已知的统计特征并构建描述潜在时间模式的示例性图表。某些函数的输出可用于执行传统的机器学习任务,包括聚类、分类和异常检测。该包还包含文献中为聚类目的而引入的两个生物序列数据集,以及三个有趣的合成数据库。本文描述了该包的主要特征,并通过多种示例说明其使用方法。来自不同领域的实践者均可受益于ctsfeatures提供的宝贵工具。