Recent work has proposed Wasserstein k-means (Wk-means) clustering as a powerful method to identify regimes in time series data, and one-dimensional asset returns in particular. In this paper, we begin by studying in detail the behaviour of the Wasserstein k-means clustering algorithm applied to synthetic one-dimensional time series data. We study the dynamics of the algorithm and investigate how varying different hyperparameters impacts the performance of the clustering algorithm for different random initialisations. We compute simple metrics that we find are useful in identifying high-quality clusterings. Then, we extend the technique of Wasserstein k-means clustering to multidimensional time series data by approximating the multidimensional Wasserstein distance as a sliced Wasserstein distance, resulting in a method we call `sliced Wasserstein k-means (sWk-means) clustering'. We apply the sWk-means clustering method to the problem of automated regime detection in multidimensional time series data, using synthetic data to demonstrate the validity of the approach. Finally, we show that the sWk-means method is effective in identifying distinct market regimes in real multidimensional financial time series, using publicly available foreign exchange spot rate data as a case study. We conclude with remarks about some limitations of our approach and potential complementary or alternative approaches.
翻译:近期研究提出Wasserstein k-means(Wk-means)聚类作为一种识别时间序列数据(尤其是一维资产收益率)状态的有效方法。本文首先详细研究应用于合成一维时间序列数据的Wasserstein k-means聚类算法行为,分析算法动态过程,探究不同超参数变化对不同随机初始化条件下聚类性能的影响。我们计算了可用于识别高质量聚类的简单指标。随后,通过将多维Wasserstein距离近似为切片Wasserstein距离,将Wasserstein k-means聚类技术扩展至多维时间序列数据,由此提出"切片Wasserstein k-means(sWk-means)聚类"方法。将该方法应用于多维时间序列数据的自动状态检测问题,通过合成数据验证其有效性。最后,以公开外汇即期汇率数据为案例,证明sWk-means方法能有效识别真实多维金融时间序列中的不同市场状态。文章结尾讨论了该方法的局限性及潜在补充或替代方案。