Sequence classification has a wide range of real-world applications in different domains, such as genome classification in health and anomaly detection in business. However, the lack of explicit features in sequence data makes it difficult for machine learning models. While Neural Network (NN) models address this with learning features automatically, they are limited to capturing adjacent structural connections and ignore global, higher-order information between the sequences. To address these challenges in the sequence classification problems, we propose a novel Hypergraph Attention Network model, namely Seq-HyGAN. To capture the complex structural similarity between sequence data, we first create a hypergraph where the sequences are depicted as hyperedges and subsequences extracted from sequences are depicted as nodes. Additionally, we introduce an attention-based Hypergraph Neural Network model that utilizes a two-level attention mechanism. This model generates a sequence representation as a hyperedge while simultaneously learning the crucial subsequences for each sequence. We conduct extensive experiments on four data sets to assess and compare our model with several state-of-the-art methods. Experimental results demonstrate that our proposed Seq-HyGAN model can effectively classify sequence data and significantly outperform the baselines. We also conduct case studies to investigate the contribution of each module in Seq-HyGAN.
翻译:摘要:序列分类在健康领域的基因组分类和商业领域的异常检测等不同场景中具有广泛的实际应用。然而,序列数据缺乏显式特征,使得机器学习模型难以有效处理。虽然神经网络模型通过自动学习特征解决了这一问题,但此类模型仅能捕捉相邻结构关联,忽略了序列间全局高阶信息。为应对序列分类任务中的这些挑战,我们提出了一种新型超图注意力网络模型——Seq-HyGAN。为捕捉序列数据间的复杂结构相似性,我们首先构建一个超图,其中序列被表示为超边,而从序列中提取的子序列则被表示为节点。此外,我们引入了一种基于注意力的超图神经网络模型,该模型采用双层注意力机制:在生成序列表示(作为超边)的同时,为每条序列学习关键子序列。我们在四个数据集上进行了广泛实验,以评估和比较我们的模型与多种最先进方法。实验结果表明,我们提出的Seq-HyGAN模型能够有效分类序列数据,并显著优于基线方法。我们还通过案例研究探讨了Seq-HyGAN中各模块的贡献。