A key operation for massive data series collection analysis is similarity search. According to recent studies, SAX-based indexes offer state-of-the-art performance for similarity search tasks. However, their performance lags under high-frequency, weakly correlated, excessively noisy, or other dataset-specific properties. In this work, we propose Deep Embedding Approximation (DEA), a novel family of data series summarization techniques based on deep neural networks. Moreover, we describe SEAnet, a novel architecture especially designed for learning DEA, that introduces the Sum of Squares preservation property into the deep network design. We further enhance SEAnet with SEAtrans encoder. Finally, we propose novel sampling strategies, SEAsam and SEAsamE, that allow SEAnet to effectively train on massive datasets. Comprehensive experiments on 7 diverse synthetic and real datasets verify the advantages of DEA learned using SEAnet in providing high-quality data series summarizations and similarity search results.
翻译:大规模数据序列集合分析的关键操作是相似性搜索。根据近期研究,基于SAX的索引方法在相似性搜索任务中提供了最先进的性能。然而,在高频、弱相关、过度噪声或其他特定数据集属性下,其性能表现欠佳。在本工作中,我们提出了深度嵌入近似(DEA),一种基于深度神经网络的新型数据序列摘要技术族。此外,我们描述了SEAnet,一种专门为学习DEA而设计的新型架构,该架构将平方和保持特性引入深度网络设计。我们进一步通过SEAtrans编码器增强了SEAnet。最后,我们提出了新颖的采样策略SEAsam和SEAsamE,使SEAnet能够有效地在大型数据集上进行训练。在7个不同的合成与真实数据集上的综合实验验证了使用SEAnet学习的DEA在提供高质量数据序列摘要和相似性搜索结果方面的优势。