This study explores the application of self-supervised learning techniques for event sequences. It is a key modality in various applications such as banking, e-commerce, and healthcare. However, there is limited research on self-supervised learning for event sequences, and methods from other domains like images, texts, and speech may not easily transfer. To determine the most suitable approach, we conduct a detailed comparative analysis of previously identified best-performing methods. We find that neither the contrastive nor generative method is superior. Our assessment includes classifying event sequences, predicting the next event, and evaluating embedding quality. These results further highlight the potential benefits of combining both methods. Given the lack of research on hybrid models in this domain, we initially adapt the baseline model from another domain. However, upon observing its underperformance, we develop a novel method called the Multimodal-Learning Event Model (MLEM). MLEM treats contrastive learning and generative modeling as distinct yet complementary modalities, aligning their embeddings. The results of our study demonstrate that combining contrastive and generative approaches into one procedure with MLEM achieves superior performance across multiple metrics.
翻译:本研究探讨了自监督学习技术在事件序列中的应用。事件序列是银行、电子商务和医疗保健等多种应用中的关键模态。然而,针对事件序列的自监督学习研究有限,且来自图像、文本和语音等其他领域的方法可能难以直接迁移。为确定最合适的方法,我们对先前已识别出的性能最佳方法进行了详细的比较分析。我们发现,无论是对比式方法还是生成式方法均不占绝对优势。我们的评估包括事件序列分类、下一事件预测以及嵌入质量评估。这些结果进一步凸显了结合两种方法的潜在优势。鉴于该领域缺乏对混合模型的研究,我们最初从其他领域迁移了一个基线模型。然而,在观察到其性能不足后,我们开发了一种名为多模态学习事件模型(MLEM)的新方法。MLEM将对比学习与生成式建模视为两种不同但互补的模态,并对齐它们的嵌入表示。我们的研究结果表明,通过MLEM将对比式与生成式方法结合到一个流程中,能够在多项指标上实现更优的性能。