A key operation for massive data series collection analysis is similarity search. According to recent studies, SAX-based indexes offer state-of-the-art performance for similarity search tasks. However, their performance lags under high-frequency, weakly correlated, excessively noisy, or other dataset-specific properties. In this work, we propose Deep Embedding Approximation (DEA), a novel family of data series summarization techniques based on deep neural networks. Moreover, we describe SEAnet, a novel architecture especially designed for learning DEA, that introduces the Sum of Squares preservation property into the deep network design. We further enhance SEAnet with SEAtrans encoder. Finally, we propose novel sampling strategies, SEAsam and SEAsamE, that allow SEAnet to effectively train on massive datasets. Comprehensive experiments on 7 diverse synthetic and real datasets verify the advantages of DEA learned using SEAnet in providing high-quality data series summarizations and similarity search results.
翻译:大规模数据序列集合分析中的关键操作为相似性搜索。近期研究表明,基于SAX的索引在相似性搜索任务中表现优异,但其性能在高频、弱相关、过度噪声或其他数据集特定属性下会显著下降。本文提出深度嵌入近似(DEA)——一种基于深度神经网络的新型数据序列摘要技术族。同时,我们描述了SEAnet——一种专为学习DEA设计的新型架构,该架构将平方和保持性质引入深度网络设计。我们进一步通过SEAtrans编码器增强SEAnet,并最终提出两种新型采样策略SEAsam与SEAsamE,使得SEAnet能在海量数据集上高效训练。在7个不同合成与真实数据集上的综合实验验证了通过SEAnet学习的DEA在提供高质量数据序列摘要与相似性搜索结果方面的优势。