Continuous monitoring with an ever-increasing number of sensors has become ubiquitous across many application domains. However, acquired time series are typically high-dimensional and difficult to interpret. Expressive deep learning (DL) models have gained popularity for dimensionality reduction, but the resulting latent space often remains difficult to interpret. In this work we propose SOM-CPC, a model that visualizes data in an organized 2D manifold, while preserving higher-dimensional information. We address a largely unexplored and challenging set of scenarios comprising high-rate time series, and show on both synthetic and real-life data (physiological data and audio recordings) that SOM-CPC outperforms strong baselines like DL-based feature extraction, followed by conventional dimensionality reduction techniques, and models that jointly optimize a DL model and a Self-Organizing Map (SOM). SOM-CPC has great potential to acquire a better understanding of latent patterns in high-rate data streams.
翻译:随着传感器数量的不断增加,连续监测已在众多应用领域变得无处不在。然而,采集到的时间序列通常维度较高且难以解释。表达性深度学习模型在降维方面日益流行,但由此得到的潜在空间往往仍难以解释。本文提出SOM-CPC模型,该模型能在保持高维信息的同时,在组织化的二维流形上对数据进行可视化。我们着眼于一个尚未充分探索且具有挑战性的场景——高速率时间序列,并在合成数据与真实数据(生理数据和音频记录)上证明,SOM-CPC优于强基线方法,包括基于深度学习的特征提取后接传统降维技术,以及联合优化深度学习模型与自组织映射的模型。SOM-CPC在深入理解高速率数据流中的潜在模式方面具有巨大潜力。